Loading...
Please wait, while we are loading the content...
Similar Documents
Using SAS Enterprise Miner to predict breast cancer at early stage
| Content Provider | Semantic Scholar |
|---|---|
| Author | Ikoro, Gibson Okechukwu |
| Copyright Year | 2015 |
| Abstract | Breast cancer is the leading cause of cancer-related deaths among women worldwide, and its early detection can reduce mortality rate. Using a data set containing information about breast screening, we constructed a model that can provide early indication of a patient's tendency to develop breast cancer. This data set has information about breast screening from patients who were believed to be at risk of developing breast cancer. The most important aspect of this analysis is that we excluded all patients with symptoms commonly associated with breast cancer, while keeping patients with symptoms that are less likely or unknown to be associated with breast cancer as input predictors. The hope was that a model could be developed that would identify women at high risk of developing breast cancer. This group could then be subjected to more intense screening with a view to earlier detection of the cancer and thus improved outcomes. The target variable is a binary variable with two values, 1 (indicating a type of cancer is present) and 0 (indicating a type of cancer is not present). SAS® Enterprise Miner™ 12.1 was used to perform data validation and data cleansing, to identify potentially related predictors, and to build models that can be used to predict at an early stage the likelihood of patients developing breast cancer. We compared two models: the first model was built with an interactive node and a cluster node and the second was built without an interactive node and a cluster node. Classification performance was compared using a receiver operating characteristic (ROC) curve and average squares error. Interestingly, we found significantly improved model performance by using only variables that have a lesser or unknown association with breast cancer. The result shows that the logistic model with an interactive node and a cluster node has better performance with a lower average squared error (0.059614) than the model without an interactive node and a cluster node. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://support.sas.com/resources/papers/proceedings15/3101-2015.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |