Big Data Analysis

This course introduces the development skills of big data mining open sources such as Mahout and MLlib from the perspective of big data mining and analysis technology, combined with theory and practice. Topics covered in this course include: Big Data Mining and its Background, Mahout and MLlib Big Data Mining, Recommendation and Movie Recommendations, Classification Techniques and Cluster Analysis, and Integration with Stream Mining and Docker Technologies to Analyze Big Data Mining prospect analysis.

Training objectives

  • Comprehensive understanding of the knowledge of big data processing technology.

  • Learn the core data analysis techniques of Hadoop/Yarn/Spark.

  • In-depth study of the use of Mahout / MLlib mining projects in big data.

  • Master the methods of combining Storm stream processing technology and Docker with big data mining.


1. Big data mining and its background

1) Data mining definition

2) Hadoop related technology

3) Big data mining knowledge points


2. MapReduce / DAG calculation mode

1) Distributed File System DFS

2) Introduction to MapReduce computing model

3) Algorithm design using MR

4) DAG and its algorithm design


3. Cloud Mining Ditto Mahout/MLib

1) Introduction to Mahout in Hadoop

2) Introduction of Mahout/MLib in Spark

3) recommendation sources and its Mahout implementation method

4) Information clustering and its MLlib implementation method

5) Implementation method of classification technology in Mahout/MLib


4. Recommendation System and Its Application Development

1) A model of the recommendation system

2) Content-based recommendations

3) Collaborative filtering

4) Film recommendation case based on Mahout

5. Classification Technology and Its Application

1) Definition of classification

2) Classification main algorithm

3) Mahout classification process

4) Evaluation indicators and evaluation

5) Bayesian algorithm news classification example

6. Clustering Technology and Its Application

1) Definition of clustering

2) The main algorithm of clustering

3) K-Means, Canopy and its application examples

4) Fuzzy K-Means, Dirichlet and its application examples

5) News clustering example based on MLlib

7. Association Rules and Similar Items Discovery

1) Shopping basket model

2) Apriori algorithm

3) Plagiarism document discovery

4) Application of neighborhood search

8. Stream Data Mining Related Technology

1) Stream data mining and analysis

2) Storm and stream data processing model

3) Data sampling in stream processing

4) Flow filtration and Bloom filter

9. Big Data Mining Application in Cloud Environment

1) Collaboration with Hadoop/Yarn cluster applications

2) Cooperate with other cloud tools such as Docker

3) Application prospects of big data mining industry