Syllabus           Schedule          Project A         Project B         Home 

 

MGS 8040: Data Mining

Syllabus for Summer 2019

 

Instructor: Dr. Satish Nargundkar 
Office: 1029 RCB (downtown) 

Office Hours:  By appointment 
Phone: (678) 644 6838  

E-Mail : snargundkar@gmail.com  
Website:
www.nargund.com/gsu  

CRN: 53355

          Tower Place 200, Room 404

          Thursdays 5:30 – 9:45 PM

 

Prerequisites: You must already have knowledge of basic statistics, including Regression Analysis, to succeed in this course.

 

Optional Textbooks:

  1. Applied Predictive Analytics, by Dean Abbott. Wiley, ISBN: 978-1-118-72796-6.

2.       Data Science for Business: What you need to know about data mining and data-analytic thinking, by Foster Provost, Tom Fawcett, O'Reilly Media, July 2013. Print ISBN: 978-1-4493-6132-7   ISBN 10:1-4493-6132-3  Ebook ISBN: 978-1-4493-6131-0 ISBN 10:1-4493-6131-5

  1. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition, by Gordon Linoff and Michael Berry. ISBN-10: 0470650931

ISBN-13: 978-0470650936, Wiley.

4.       Making Sense of Data II by Glenn Myatt & Wayne Johnson, John Wiley& Sons, 2009.

5.       Multivariate Data Analysis by Hair, Anderson, Tatham, & Black, Prentice Hall.

6.       http://statsoft.com/textbook/stathome.html.

7.       The Little SAS Book by Delwiche and Slaughter.

 

Course Catalog Description

This course covers various analytical techniques to extract managerial information from large data warehouses. A number of well-defined data mining tasks such as classification, estimation, prediction, affinity grouping and clustering, and data visualization are discussed. Design and implementation issues for corporate data warehousing are also addressed.

 

Detailed Course Description

Data mining supports decision making by detecting patterns, devising rules, identifying new decision alternatives and making predictions. This course is organized around a number of well-defined data mining tasks: description, classification, estimation, prediction, and affinity grouping and clustering.  Students will learn to use techniques such as Rule Induction (classification trees), Logistic Regression, and Discriminant Analysis. Data visualization techniques will be used whenever possible to reveal patterns and relationships.  Students will use commercially available software tools to mine large databases.

 

The course is organized into 3 broad areas as follows:

1)      Context/Data: Decision Support for Strategic Decision-making.  Preliminary data analysis

2)      Predictive Analytics: Predictive models and evaluation: Discriminant Analysis, Trees

3)      Segmentation/Association: Techniques like Clustering and Market Basket analysis.

 

Learning Outcomes/Course Objectives

Upon completion of the course, students will be able to work on real-life projects using relatively large datasets, to build and evaluate prediction and classification models, and segment populations.

 

Specifically, students will learn to:

1.       Apply analytics techniques within a general framework for decision support within organizations.

2.       Interpret business requirements, organization structure, and translate that into data mining projects that help an organization meet their decision support needs.

3.       Collect data, perform preliminary analyses including data aggregation, variable creation/transformation, and data cleaning.

4.       Split data into training and validation samples. Use visual techniques to describe data.

5.       Create Cross-tabulations for bivariate analysis.

6.       Explain in your own words the assumptions of various techniques such as Cluster Analysis, Multiple Regression, Discriminant Analysis, Logistic Regression, and Artificial Neural Networks.

7.       Build multiple regression, discriminant analysis, and Logistic models for forecasting.

8.       Validate classification models using the Kolmogorov-Smirnov (K-S) test.                      

9.       Interpret the output of Classification tree algorithms like CART and CHAID.

10.   Segment data using Cluster Analysis, and interpret the output.

11.   Discuss issues of implementation of the results of various techniques.

12.   Develop methods to monitor the ongoing performance of implemented models.

13.   Present an analytics project report to top management in plain language, with implications for business decision making clearly stated.

 

Methods of Instruction:

Students will be walked through an entire real-life project in the financial services industry as the course progresses. This will be done through a combination of lectures and discussion of cases, plus guest lectures from industry experts. The team-based project will help you put most of the concepts together and apply them to another dataset in an industry of your choice.

 

Grading:

 

                                   

 

 

Course Average

Grade

Course Average

Grade

Assignments   

20%

 

94-96, 97+

A, A+

77-79

C+

Tests (2)

60%

 

90-93

A-

73-76

C

Team Project

20%

 

87-89

B+

70-72

C-

 

 

 

83-86

B

60-69

D

 

80-82

B-

Less than 60

F

 

Late work will get partial credit only, with 10% less for each day of delay.

 

Software: Students are encouraged (not required!) to do project work in SAS in order to develop a marketable skill. You may choose other software - SPSS is available at GSU and R is free online.

 

General Policies:

1.       Students are expected to attend each class (who knows, you may actually enjoy the class!), arrive on time and participate in class discussions.

2.       Turn off cell phones, pagers, stereos, TVs, etc. when in class. Treat the instructor and each other with courtesy.

 

Course Assessment:

Your constructive assessment of this course plays an indispensable role in shaping education at Georgia State. Upon completing the course, please take the time to fill out the online course evaluation.