Assignment 5

Discriminant Analysis, KS Test

1. Continue from Assignment 3, where you decided on some dummy categories for 5 variables from save.train.

Define the dummies in SAS. Create a new dataset called save.train2 that contains the dummies.

Use the same program to also create a dataset called save.valid2, using the save.valid as the starting point.

2. Run the proc reg statements to do a regression on save.train2 with the dummies as Xs to predict the dependent variable.

DO NOT run the regression on the save.valid dataset! The model must be built only on the training set, but tested on both the training and validation sets.

3. Run the proc score statements on save.train2 to create a score variable (bgscore) in a new dataset (save.scrtrain).

Run the proc score statements again, this time on save.valid2, to create the score variable in another new dataset (save.scrvalid).

4. Run crosstabs of the score variable against the dependent (bgscore*good) from both save.scrtrain and save.scrvalid after formatting the bgscore variable appropriately to create score buckets.

5. Perform the KS test with these scores on two separate worksheets, once for the training data and once for validation.

Note that since the model is a simplistic one with limited variables, the KS may show very poor separation between the categories of the dependent. That is OK for the assignment.

The programs to do this assignment are provided on a separate link on the course website.