"Introducing YARN" - Hadoop No More a Baby Elephant » Big Data Partnership

“Introducing YARN�? – Hadoop No More a Baby Elephant

Yarn

With the increasing popularity and the addiction of companies towards Hadoop, also Hadoop being an unanimous solution for Big data platforms makes the Hadoop development team to focus on the current architectural deficiencies and make Hadoop free from such underlying architectural issues. In that path a new Hadoop MapReduce version has taken birth MapReduce 2.0 (MRv2) or YARN. MapReduce has undergone a complete overhaul in hadoop-0.23 and now we have MapReduce 2.0 (MRv2) or YARN.

Let me take this opportunity to give the brief introduction to YARN, The basic change in MRv2 is the split-up of two major functionalities of JobTracker into separate daemons. They are,

  1. Resource Management
  2. Scheduling/Monitering

In order to achieve this new components have been introduced, namely,

  1. ResourceManager (RM)
  2. ApplicationsManager
  3. NodeManager (NM)
  4. ApplicationMaster (AM)
  5. Container

I) ResourceManager (RM)
The ResourceManager (RM) is the key service offered in YARN. Clients can interact with the framework using ResourceManager. ResourceManager is the master for all other daemons available in the framework.
ResourceManager has two major components,

  1. Scheduler
  2. ApplicationsManager

a) Scheduler
1. The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc.
2. It is a pure scheduler since it does not monitor or track the status of the application instead it purely performs its scheduling function based the resource requirements of the applications
3. It schedules the resources depending the resource “Container�?
4. The Scheduler has the pluggable policy plug-in which is responsible for partitioning the cluster resources among the various queues, applications etc, for example
a) CapacityScheduler
b) FairScheduler

b) ApplicationsManager
1. ApplicationsManager is responsible for accepting job-submissions
2. Assigning the first container for executing the application specific ApplicationMaster
3. Provides the service for restarting the ApplicationMaster container on failure

II) NodeManager (NM)
NodeManager is similar to TaskTracker
1. NodeManager is responsible for Containers
2. Monitoring container resource usage (like cpu, memory, disk, network)
3. Reporting to the ResourceManager/Scheduler
III) ApplicationMaster (AM)
1. ApplicationMaster is responsible for negotiating appropriate resource containers from the Scheduler
2. Tracking the status and monitoring progress for applications running under this ApplicationMaster

IV) Container
Resource Container incorporates elements such as memory, cpu, disk, network etc. Only memory is supported in first version

Posted on March 2, 2012 in Apache Hadoop, Blog, Hadoop Common, Hadoop Ecosystem, MapReduce, Science, Technology, Training

Share the Story

Response (1)

  1. Praveenesh
    April 30, 2012 at 9:57 am ·

    I think, in the current version of hadoop 0.23, only capacity scheduler is supported so far. No support for Fair scheduler yet.
     

Leave a reply

Back to Top