Course Objectives

  • Recognize the iterative character of a Data Mining Process and also to implement some Data Preprocessing Techniques
  • To learn the advantages of Data Reduction in the Preprocessing Phase and also to understand the Machine Learning Algorithm
  • To understand the methods of Statistical Inference used in Data Mining Applications
  • To know the C4.5 Algorithm for generating Decision Trees & to describe the components of Artificial Neural Networks
  • To illustrate Web Mining using Hyperlink-Inducted Topic Search (HITS), LOGSOM and Path Traversal Algorithm

UNIT I Data Mining Concepts and Outlier Analysis

Data Mining Concepts – Roots – Process – Data Collection to Data Preprocessing – Business aspects of Data Mining – Preparing the Data – Representation – Characteristics – Transformation of Raw Data – Missing Data – Outlier Analysis

UNIT II Feature Reduction and Learning from Machine

Data Reduction – Dimension of Large Data Sets – Features Reduction – Relief Algorithm – Principal Component Analysis – Value Reduction – Learning from Data – Learning Machine – Types of Learning Methods – Support Vector Machines – Semi Supervised Support Vector Machines – Model Selection

UNIT III Predictive Statistical Methods

Statistical Methods – Bayesian Inference – Predictive Regression – Analysis of Variance – Logistic Regression – Log-Linear Models – Linear Discriminant Analysis

UNIT IV Decision Trees and Artificial Neural Networks

Decision Trees – Trees – C4.5 Algorithm – Decision Rules – Cart Algorithm – Artificial Neural Networks – Model of an Artificial Neuron – Learning Process – Self-Organizing Maps – Deep Learning – Convolution Neural Networks

UNIT V Web Mining & Text Mining

Web Mining & Text Mining – Web Content, Structure and Usage Mining – HITS and LOGSOM Algorithms – Mining Path – Traversal Patterns – Page Rank Algorithm – Recommender Systems – Text Mining – Latent Semantic Analysis

Learning Resources

  • Mehmed Kantardzic (2019), “Data Mining: Concepts, Models, Methods, and Algorithms”, 3rd Edition, Wiley.
  • Milan Kumar (2019) , CIO Series Immersive and Augmented Analytics , First Reprint. , Indra Publishing House .
  • Pang-Ning Tan (2018) , “Michael Steinbach, Anuj Karpatne, Vipin Kumar - Introduction to Data Mining”, 2nd Edition, Pearson.
  • Anil Maheshwari (2017) , “Data Analytics”,1st Edition, McGraw Hill.
  • U Dinesh Kumar (2017), “Business Analytics The Science of Data Driven Decision Making” 1st Edition , Wiley.
  • Anasse Bari , Mohamed Chaouchi and Tommy Jung (2015) ,” Predictive Analytics” , Willey.

@ 2020 - 2021 Copyright, SRM Institute of Science and Technology (formerly known as SRM University), All Rights Reserved