Lehigh University
COLLEGE HOME | LEHIGH HOME | SEARCH




   

CSE 347  Data Mining  (3)

Instructor: 

Current Catalog Description
Overview of modern data mining techniques: data cleaning; attribute and subset selection; model construction, evaluation and application. Fundamental mathematics and algorithms for decision trees, covering algorithms, association mining, statistical modeling, linear models, neural networks, instance-based learning and clustering covered. Practical design, implementation, application and evaluation of data mining techniques in class projects. Credit will not be given for both CSE 347 and CSE 447. Prerequisites: Either CSE 17 and MATH 231, or BIS 120 and ECO 145.

Textbook
Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, ISBN 1-55860-552-5, Morgan Kauffmann.

References
Course Goals: To learn the basic algorithms of text/data mining and their application to practical, realistic problems.

Prerequisites by Topic
Introductory Probability and Statistics; Top-down Design; Primitive Data Types; Repetition and Selection; Recursion; Pointer Structures; Classes.

Major Topics Covered in the Course
Input: Concepts, instances, attributes; Output: Knowledge Representations; Algorithms: Rules & statistical models, Decision trees, Covering, Association mining; Credibility: Evaluation of results; Implementations: Decision trees, Classification rules, Support Vector Machines, Instance-based Learning, Numeric prediction, Clustering; Engineering the input and output; Text Mining: Introduction, Feature extraction, Semantic models, Detecting trends.

Laboratory projects (specify number of weeks on each)
One semester-long class project.
 
Estimate CSAB Category Content
                                                                          CORE       ADVANCED
Data Structures                                                                             0.0  
Computer Organization and Architecture                                        0.0
Algorithms Software Design                                                           3.0  
Concepts of Programming Languages                                             0.0
 
Oral and Written Communications:
Every student is required to submit at least  __1___  written reports (not including exams, tests, quizzes, or commented programs) of typically  ___20__  pages and to make  ___3__  oral presentations of typically  __20___  minutes duration. Include only material that is graded for grammar, spelling, style, and so forth, as well as for technical content, completeness, and accuracy.

Social and Ethical Issues:
Ethical issues such as data privacy in data mining are discussed at various points during the class.

Theoretical Content:
Input (3 classes); Output Knowledge Representation (3 classes); Algorithms (18 classes); Evaluation (3 classes).

Problem Analysis and Solution
Design: In the class project, students are required to design and implement an end-to-end solution of a real-world data mining problem.

 

 

 

 


 

     
image


©2008 P.C. Rossin College of Engineering & Applied Science
Computer Science & Engineering, Packard Laboratory, Lehigh University, Bethlehem PA 18015