Introduction to Big Data with Hadoop

Introduction to Big Data with Hadoop
Spring 2014
09 to 3:30 PM; 12, 13, 18 March

Level: ITI, Cloud Computing Track
Place: R8
Duration: 18 Hours.

Prerequisites:

1- Linux command line
2- Python
3- JAVA

Syllabus:

During this course we’re going to discuss what big data is, what Hadoop is, why it’s useful, and how to write MapReduce code. By the end of the course, you will understand what “Big Data” stands for , You’ll be able to describe the kinds of problems Hadoop addresses, and you’ll have written MapReduce programs to efficiently analyze very large Web server log files. The course will cover the following points:

Introduction

Why Big Data?
Terminology
Key Technologies: Google File System, MapReduce,
Hadoop
Hadoop and other database tools
Types of Databases

HDFS and Mapreduce

HDFS
Data redundancy
NameNode High Availability
Hashtables
MapReduce
Mapreduce Code

Mapreduce Design Patterns

Filtering pattern
Summarization patterns
Structural patterns

Textbooks:

Hadoop: The Definitive Guide. MapReduce for the Cloud By Tom White

MapReduce Design Patterns By Donald Miner, Adam Shook

Date

12th March.

13th March.

18th March

Lecture

Lesson 1: Introduction

Lesson 2: HDFS and MapReduce

Lesson 3: Mapreduce Code

Lesson 4: Patterns