Hadoop in Practice: Chapter 1 – Hadoop in a heartbeat

Big data brings with it two fundamental challenges: how to store and work with voluminous data sizes, and more important, how to understand data and turn it into a competitive advantage.

Hadoop is a distributed filesystem, and it offers a way to parallellize and execute progams on a cluster of machines.

Figure 1.3 – Topography
The HDFS namenode keeps in memory the metadata about the filesystem such as which datanodes manage the blocks for each file.

HDFS clients talk to the namenode for metadata-related activities and DataNodes for reading and writing files.

DataNodes communicate with each other for pipelining file reads and writes.

Files are mede up of blocks, and each file can be replicated multiple times, meaning there are many identical copies of each block for the file (default = 3).


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s