Predicting the survival of a passenger on the Titanic
1. See the attached dataset that contains 891 observations randomly selected from among the roughly 2200 passengers on board. Roughly 32% were rescued.
2. Perform a multiple regression with “Survived” as the dependent, and Age, Class (socio-economic class, where 1=upper, 2=middle and 3=poor), Sex and Siblings as the independent variables. This has already been done – you can compare your output to this to confirm that you got the same results. If you do not have access to a regression package, then use the output provided in the worksheet.
3. Copy the predicted values from the residual output to a new worksheet, along with the original values of Survived next to them.
4. Sort both columns in descending order of the predicted values.
5. Create a Vlookup table with bins (groups) for the predicted values.
6. Use the Vlookup function to assign a bin value to each predicted value in the column next to it.
7. Create a pivot table to see how the original y values are spread across the prediction bins.
8. Sort the predicted score groups in descending order.
9. Create a cumulative percentage table like the one discussed in class.
Answer the following based on the table created in part 9 above:
1. Interpret any one number in each of the cumulative percent columns.
2. Interpret any one number in each of the last two columns.
Overall, how good of a model would you say this is? What cutoff score would you use to predict that a person survived?