Introduction to Big Data with Hadoop
Spring 2014
09 to 3:30 PM; 12, 13, 18 March
Level: ITI, Cloud Computing Track
Place: R8
Duration: 18 Hours.
Prerequisites:
1- Linux command line
2- Python
3- JAVA
Syllabus:
During this course we’re going to discuss what big data is, what Hadoop is, why it’s useful, and how to write MapReduce code. By the end of the course, you will understand what “Big Data” stands for , You’ll be able to describe the kinds of problems Hadoop addresses, and you’ll have written MapReduce programs to efficiently analyze very large Web server log files. The course will cover the following points:
Introduction
HDFS and Mapreduce
Mapreduce Design Patterns
Textbooks:
Place: R8
Duration: 18 Hours.
Prerequisites:
1- Linux command line
2- Python
3- JAVA
Syllabus:
During this course we’re going to discuss what big data is, what Hadoop is, why it’s useful, and how to write MapReduce code. By the end of the course, you will understand what “Big Data” stands for , You’ll be able to describe the kinds of problems Hadoop addresses, and you’ll have written MapReduce programs to efficiently analyze very large Web server log files. The course will cover the following points:
Introduction
- Why Big Data?
- Terminology
- Key Technologies: Google File System, MapReduce,
- Hadoop
- Hadoop and other database tools
- Types of Databases
HDFS and Mapreduce
- HDFS
- Data redundancy
- NameNode High Availability
- Hashtables
- MapReduce
- Mapreduce Code
Mapreduce Design Patterns
- Filtering pattern
- Summarization patterns
- Structural patterns
Textbooks: