- Mastering Hadoop 3
- Chanchal Singh Manish Kumar
- 556字
- 2025-04-04 14:54:50
Configuring node labels
In this section we will talk about configuring node labels in a Hadoop cluster. Every request to YARN goes through the Resource Manager therefore the first step is to enable node label configuration to resource manager:
- Enabling node label and setting node label path: The YARN-site.xml file contains the configuration related to YARN, as follows:
<property>
<name>YARN.node-labels.enabled</name>
<value>true</value>
<description>Enabling node label feature</description>
</property>
<property>
<name>YARN.node-labels.fs-store.root-dir</name>
<value>http://namemoderpc:port/YARN/packt/node.labels</value>
<description>The path to the node labels file.</description>
</property>
- Creating directory structure on HDFS: The next step is to create a node label directory where node label information will be stored. The permission should be set so that YARN will be able to access this directories, for example:
sudo su hdfs Hadoop fs -mkdir -p /YARN/packt/node-labels Hadoop fs -chown -R YARN:YARN /YARN Hadoop fs -chmod -R 700 /YARN
- Granting permission to YARN: The YARN user directory must be present on HDFS. If not, create the directory and assign the permission to the directory to the YARN user, as follows:
sudo su hdfs Hadoop fs -mkdir -p /user/YARN Hadoop fs -chown -R YARN:YARN /user/YARN Hadoop fs -chmod -R 700 /user/YARN
- Creating node label: Once the preceding steps are done, we can create node labels using the following command:
sudo -u YARN YARN rmadmin -addToClusterNodeLabels "<node-
label1>
(exclusive=<true|false>),<node-label2>(exclusive=<true|false>)"
By default, the exclusive property is true. It is good to check whether the node labels are created or not. We can list the node labels to verify it as shown in the following code:
sudo -u YARN YARN rmadmin -addToClusterNodeLabels
"spark(exclusive=true),Hadoop(exclusive=false)"
YARN cluster --list-node-labels
- Assigning node with node label: Once node label is created, we can now assign node labels to nodes. Each node will have only one node label assigned to it. Use the following command to assign a node with a node label:
YARN rmadmin -replaceLabelsOnNode "<nodeaddress1>:<port>=<node-
label1> <nodeaddress2>:<port>=<node-label2>"
The following shows an example:
sudo su YARN YARN rmadmin -replaceLabelsOnNode "packt.com=spark
packt2.com=spark,packt3.com=Hadoop,packt4.com=Hadoop"
- Queue-node label association: The last step is to assign each queue with a node label so that the job submitted by the queue goes to the node label assigned to it, for example:
<property> <name>YARN.scheduler.capacity.root.queues</name> <value>marketing,sales</value> </property> <property> <name>YARN.scheduler.capacity.root.accessible-node-
labels.spark.capacity</name> <value>100</value> </property> <property> <name>YARN.scheduler.capacity.root.accessible-node-
labels.Hadoop.capacity</name> <value>100</value> </property> <!-- configuration of queue-a --> <property> <name>YARN.scheduler.capacity.root.marketing.accessible-node
-labels</name> <value>x,y</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.capacity</name> <value>40</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.accessible-node
-labels.spark.capacity</name> <value>100</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.accessible-node-
labels.Hadoop.capacity</name> <value>50</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.queues</name> <value>product,service</value> </property> <!-- configuration of queue-b --> <property> <name>YARN.scheduler.capacity.root.sales.accessible-node-
labels</name> <value>Hadoop</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.capacity</name> <value>60</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.accessible-node-
labels.Hadoop.capacity</name> <value>50</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.queues</name> <value>product_sales</value> </property> <!-- configuration of queue-a.a1 --> <property>
<name>YARN.scheduler.capacity.root.marketing.product.
ccessible-node-labels</name> <value>spark,Hadoop</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.
product.capacity</name> <value>40</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.product.
accessible-node-labels.spark.capacity</name> <value>30</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.product.accessible-
node-labels.Hadoop.capacity</name> <value>50</value> </property> <!-- configuration of queue-a.a2 --> <property>
<name>YARN.scheduler.capacity.root.marketing.service.
accessible-node-labels</name> <value>spark,Hadoop</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.service.
capacity</name> <value>60</value> </property> <property>
<name>YARN.scheduler.capacity.root.marketing.service.
accessible-node-labels.spark.capacity</name> <value>70</value> </property> <property> <name>YARN.scheduler.capacity.root.marketing.service.
accessible-node-labels.Hadoop.capacity</name> <value>50</value> </property> <!-- configuration of queue-b.b1 --> <property> <name>YARN.scheduler.capacity.root.sales.product_sales.accessible-node-
labels</name> <value>y</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.product_sales.capacity</name> <value>100</value> </property> <property> <name>YARN.scheduler.capacity.root.sales.product_sales.accessible-node-
labels.Hadoop.capacity</name> <value>100</value> </property>
- Refreshing the queue: Once the configuration and queue assignment with the node label is done, we can proceed with refreshing the queues. Use following command to refresh the queue:
sudo su YARN
YARN rmadmin -refreshQueues
- Submitting job: The basic idea of a node label is to create the partition of nodes so that each partition can be used for a specific use case. The user can submit jobs to the queue and specify the node label application that should be used for execution of tasks. We can do it using the following command:
Hadoop jar wordcount.jar -num_containers 4 -queue product -
node_label_expression Hadoop
A node can be removed and reassigned to another node label. Any node label configured with exclusive set to false will be treated as non-exclusive node labels and resources available with those non-exclusive nodes that will be shared with other node labels.