SAS logoMSCI 522: Multivariate Statistics for Data Mining

(Phase II – Spring/Summer Term)

Lecturer: Sven F. Crone

Objectives

You will be introduced to methods for large scale data exploration & data reduction widely applied in business practice. These include both multivariate statistical methods and heuristic methods from computational intelligence and artificial intelligence for classification applications. These methods aim to simplify and add insights to large, complex data sets and to predict the behaviour of individuals, e.g. the cross- and up-selling potential of customers in direct marketing or the crditworthiness of credit applicants. Time permitting, the course also introduces methods of classification and clustering that aim to categorise individuals into distinct, disjoint segments with different patterns of behaviour. The module extends the statistical model building concepts and range of models towards hands-on modelling in multiple workshops using SAS Enterprise Miner.

Core texts

Miller T.W. (2005) Data and Text Mining – A Business Applications Aproach, Pearson, New Jersey
SAS (2005) SAS Enterprise Miner, SAS, New Jersey (will be provided)
The texts provide a tutorial on how to use SAS Enterprise Miner and a short overview of the methods and approaches in Data Mining. They will be supplemented by readings drawn from journal articles.

Topics

Introduction to data mining
The SEMMA process of data mining
Data mining methods for data reduction: cluster analysis, principal component analysis and factor analysis.
Data mining methods for classification: discriminant analysis, decision trees and artificial neural networks for classification.
Evaluating classification accuracy

Assessment

Coursework 100% of module mark

Contact hours

A weekly two-hour class for the first half of the Spring term.

A triple-accredited business school Association of MBAs | AACSB | EQUIS