Clustering Assignment

 

Note: SAS is optional. You may do this with SPSS or any other software that has the capability to do k-means clustering.

 

Option 1: The Auto Loan Dataset

  1. Create a new dataset with all the special cases (-ve values) removed. This program in SAS is one way to do it.

Data save.new;

Set save.old;

If trades < 0 then delete;

If ageotd < 0 then delete;  ***You can do this for each of the variables you choose to cluster on ****;

Run;

Quit;

 

  1. Pick any 7 to 10 numeric variables from the data. Note that you will use the original variables, not any dummies you created from them. Run correlations to make sure they are not too correlated with each other (correlations > .30 are not desirable). If some are, then drop one or more and replace with something else. Do not ask me which variables to pick! That is part of what you have to do in a real life situation – figure out which ones might make most sense to pick.

 

  1. Standardize the variables. Use the PROC STANDARD procedure in SAS.

 

  1. Perform K-Means cluster analysis (PROC FASTCLUS) with 2,3,4, …10 clusters, and then pick the solution that makes most sense to you, based on our discussions in class. Interpret the solution. What are the characteristics of each cluster? If you were the marketing to them, what sorts of offers would you make to each segment?

 

As usual, write a brief report on what you did, what results you got, and your interpretation, and post to the web. I do not need to see every single thing you did – just the final result and your analysis.

 

Option 2: Any dataset of your choice

 

Do steps 2 - 4 from above and write a report about the clusters you see. The report should include an introduction, description of data (where obtained, the data dictionary, sample size, and any other interesting things about the data), and then the segmentation results from clustering.