Clustering Assignment
Note: SAS is optional. You
may do this with SPSS or any other software that has the capability to do
k-means clustering.
Option 1: The Auto Loan Dataset
- Create
a new dataset with all the special cases (-ve
values) removed. This program in SAS is one way to do it.
Data save.new;
Set save.old;
If trades < 0 then delete;
If ageotd
< 0 then delete; ***You
can do this for each of the variables you choose to cluster on ****;
Run;
Quit;
- Pick
any 7 to 10 numeric variables from the data. Note that you will use the
original variables, not any dummies you created from them. Run correlations
to make sure they are not too correlated with each other (correlations
> .30 are not desirable). If some are, then drop one or more and
replace with something else. Do not ask me which variables to pick! That
is part of what you have to do in a real life situation – figure out which
ones might make most sense to pick.
- Standardize
the variables. Use the PROC STANDARD procedure in SAS.
- Perform
K-Means cluster analysis (PROC FASTCLUS) with 2,3,4,
…10 clusters, and then pick the solution that makes most sense to you,
based on our discussions in class. Interpret the solution. What are the
characteristics of each cluster? If you were the marketing to them, what
sorts of offers would you make to each segment?
As usual, write a brief report on what you did, what results
you got, and your interpretation, and post to the web. I do not need to see
every single thing you did – just the final result and your analysis.
Option 2: Any dataset of your choice
Do steps 2 - 4 from above and write a report about the
clusters you see. The report should include an introduction, description of
data (where obtained, the data dictionary, sample size, and any other
interesting things about the data), and then the segmentation results from
clustering.