- Mastering Hadoop 3
- Chanchal Singh Manish Kumar
- 345字
- 2025-04-04 14:54:50
Node labels
The use of Hadoop in the organization increases over time and they board more use cases to the Hadoop platform. The data pipeline in an organization consists of multiple jobs. A Spark job may need machines with more RAM and powerful processing capabilities but, on the other hand, MapReduce can run on less powerful machines. Therefore, it is obvious that a cluster may consist of different types of machines to save infrastructure costs. A Spark job may need machines with high processing capability.
YARN label is nothing but a marker for each machine so that machines with the same label name can be used for specific jobs. The nodes with more powerful processing capabilities can be labelled with the same name and then jobs that require more powerful machines can use the same node label during submission. Each node can only have one label assigned to it, which means the cluster will have a disjointed set of nodes or we can say a cluster is partitioned based on node labels. The YARN also provides capabilities to set queue level configuration, which defines how much of a partition a queue can use. There are two types of node labels available for now, which are as follows:
- Exclusive: The exclusive node label ensures that it is the only queue permitted or allowed to access node label. The application submitted by queue having an exclusive label will have exclusive access to the partition so that no other queue can get resources.
- Non-exclusive: The non-exclusive label allows or permits idle resources sharing with other applications. The queues are assigned the node labels and applications submitted to these queues will get first priority over the respective node labels. If there is no application or job submitted by a queue to these node labels, then resources will be shared between other non-exclusive node labels. If the queue with the node label submits an application or job in between processing, the resources will be preempted from the running task and assigned to associated queues on priority basis.