CSE 347 Data Mining (3)
Instructor:
Current Catalog Description
Overview of modern data mining techniques: data cleaning; attribute and subset selection; model construction, evaluation and application. Fundamental mathematics and algorithms for decision trees, covering algorithms, association mining, statistical modeling, linear models, neural networks, instance-based learning and clustering covered. Practical design, implementation, application and evaluation of data mining techniques in class projects. Credit will not be given for both CSE 347 and CSE 447. Prerequisites: Either CSE 17 and MATH 231, or BIS 120 and ECO 145.
Textbook
Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, ISBN 1-55860-552-5, Morgan Kauffmann.
References
Course Goals: To learn the basic algorithms of text/data mining and their application to practical, realistic problems.
Prerequisites by Topic
Introductory Probability and Statistics; Top-down Design; Primitive Data Types; Repetition and Selection; Recursion; Pointer Structures; Classes.
Major Topics Covered in the Course
Input: Concepts, instances, attributes; Output: Knowledge Representations; Algorithms: Rules & statistical models, Decision trees, Covering, Association mining; Credibility: Evaluation of results; Implementations: Decision trees, Classification rules, Support Vector Machines, Instance-based Learning, Numeric prediction, Clustering; Engineering the input and output; Text Mining: Introduction, Feature extraction, Semantic models, Detecting trends.
Laboratory projects (specify number of weeks on each)
One semester-long class project.
Estimate CSAB Category Content
CORE ADVANCED
Data Structures 0.0
Computer Organization and Architecture 0.0
Algorithms Software Design 3.0
Concepts of Programming Languages 0.0
Oral and Written Communications:
Every student is required to submit at least __1___ written reports (not including exams, tests, quizzes, or commented programs) of typically ___20__ pages and to make ___3__ oral presentations of typically __20___ minutes duration. Include only material that is graded for grammar, spelling, style, and so forth, as well as for technical content, completeness, and accuracy.
Social and Ethical Issues:
Ethical issues such as data privacy in data mining are discussed at various points during the class.
Theoretical Content:
Input (3 classes); Output Knowledge Representation (3 classes); Algorithms (18 classes); Evaluation (3 classes).
Problem Analysis and Solution
Design: In the class project, students are required to design and implement an end-to-end solution of a real-world data mining problem.