MSCI 522: Multivariate Statistics for Data Mining
(Phase II – Spring/Summer Term)
Lecturer: Sven F. Crone
Objectives
You will be introduced to methods for large scale data exploration & data reduction widely applied in business practice. These include both multivariate statistical methods and heuristic methods from computational intelligence and artificial intelligence for classification applications. These methods aim to simplify and add insights to large, complex data sets and to predict the behaviour of individuals, e.g. the cross- and up-selling potential of customers in direct marketing or the crditworthiness of credit applicants. Time permitting, the course also introduces methods of classification and clustering that aim to categorise individuals into distinct, disjoint segments with different patterns of behaviour. The module extends the statistical model building concepts and range of models towards hands-on modelling in multiple workshops using SAS Enterprise Miner.
Core texts
Miller T.W. (2005) Data and Text Mining – A Business Applications Aproach, Pearson, New Jersey
SAS (2005) SAS Enterprise Miner, SAS, New Jersey (will be provided)
The texts provide a tutorial on how to use SAS Enterprise Miner and a short overview of the methods and approaches in Data Mining. They will be supplemented by readings drawn from journal articles.
Topics
Introduction to data mining
The SEMMA process of data mining
Data mining methods for data reduction: cluster analysis, principal component analysis and factor analysis.
Data mining methods for classification: discriminant analysis, decision trees and artificial neural networks for classification.
Evaluating classification accuracy
Assessment
Coursework 100% of module mark
Contact hours
A weekly two-hour class for the first half of the Spring term.
