文章目录
通过设置yarn的资源队列,可以实现不同业务的资源隔离,同时设置队列的弹性范围,以便在某个队列资源紧张时,可以使用其他队列的资源。
一. 先看下官网(可略)
1. Overview
我们先对容量调度器有一个认识:即它适合多租户的业务场景,简单的说可以规划不同的业务使用不同的队列资源。
The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.
2. Configuration
2.1. Setting up ResourceManager to use CapacityScheduler
在yarn-site.xml文件中设置:
Property | Value |
---|---|
yarn.resourcemanager.scheduler.class | org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler |
2.2. Setting capacity-scheduler.xml
etc/hadoop/capacity-scheduler.xml is the configuration file for the CapacityScheduler.
设置:capacity-scheduler.xml
1. setting up Queue
我们接下来设置的所有队列都属于root队列的子集。通过逗号分隔来设置一个队列下的子队列。
The CapacityScheduler has a predefined queue called root. All queues in the system are children of the root queue.
Further queues can be setup by configuring yarn.scheduler.capacity.root.queues with a list of comma-separated child queues.
queue-path的概念:通过queue path可以制定一个队列,一个完整的queue path:从root开头, . 来说明队列继承关系。
yarn.scheduler.capacity..queues
The configuration for CapacityScheduler uses a concept called queue path to configure the hierarchy of queues. The queue path is the full path of the queue’s hierarchy, starting at root, with . (dot) as the delimiter.
如下:
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>a,b,c</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.queues</name>
<value>a1,a2</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.queues</name>
<value>b1,b2,b3</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
2. Queue Properties
Resource Allocation
Resource Allocation using Absolute Resources configuration
Running and Pending Application Limits
Queue Administration & Permissions
Queue Mapping based on User or Group, Application Name or user defined placement rules
Queue lifetime for applications
3. application priority
Application priority works only along with FIFO ordering policy. Default ordering policy is FIFO.
4. Capacity Scheduler container preemption
Capacity Scheduler 允许 container 分配多于其所在的队列资源
5. Reservation Properties
6. Configuring ReservationSystem with CapacityScheduler
7. Dynamic Auto-Creation and Management of Leaf Queues
CapacityScheduler支持通过queue mapping自动创建父队列下的子队列。
8. Other Properties
3. Changing Queue Configuration
This behavior can be changed via yarn.scheduler.configuration.store.class in yarn-site.xml. Possible values are file, which allows modifying properties via file; memory, which allows modifying properties via API, but does not persist changes across restart; leveldb, which allows modifying properties via API and stores changes in leveldb backing store; and zk, which allows modifying properties via API and stores changes in zookeeper backing store. The default value is file.
两种方式去设置队列,通过API或者文件,鉴于重启会导致API修改的队列配置失效(但可以通过zk持久化),本文通过文件来配置队列
- 编辑capacity-scheduler.xml 和 yarn-site.xml
- 执行
yarn rmadmin -refreshQueues
可以使得队列配置生效。
4. Updating a Container (Experimental – API may change in the future)
期待一下
二. 动手设置队列
1. 设置容量调度器
修改 yarn-site.xml
<!-- 使用容量调度器 -->
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
2. 设置capacity-scheduler.xml
2.1. 设置队列资源
- 设置子队列:可以将整体资源分配成三个队列,default、online、offline,
- 设置队列资源:比如分别占用20%、30%、50%的资源。总量(必须是)100%。
- 设置弹性队列:例如 online队列默认分配30%,最大为50%的集群资源,当其他队列资源空闲时可以使用集群中资源的50%。
[root@bigdata01 hadoop]# vi capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,test,test1</value>
<description>队列列表,多个队列之间使用逗号分割</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>20</value>
<description>default队列20%</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.online.capacity</name>
<value>30</value>
<description>online队列30%</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.offline.capacity</name>
<value>50</value>
<description>offline队列50%</description>
</property>
<!-- 设置弹性队列 资源上xian--->
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>40</value>
<description>Default队列可使用的资源上限.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.online.maximum-capacity</name>
<value>50</value>
<description>online队列可使用的资源上限.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.offline.maximum-capacity</name>
<value>60</value>
<description>offline队列可使用的资源上限.</description>
</property>
2.2. 统一权限控制
队列分配资源后,对权限有严格的控制,队列只允许有权限用户的提交任务和管理任务.
权限控制分 提交权限和控制权限:
- 提交权限:拥有权限才能提交任务到该队列中;
- 控制权限:拥有权限才能kill 任务;
提交权限
<!-- 配置三个队列-->
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,online,offline</value>
<!-- 3个队列-->
<description>The queues at the this level (root is the root queue).</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.acl_submit_applications</name>
<value> </value> #空格表示任何人都无法往root队列提交作业
</property>
#queue-name=root.default
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>test,b1</value> #testqueue只允许test用户和b1用户提交作业
</property>
<property>
<name>yarn.scheduler.capacity.root.online.acl_submit_applications</name>
<value>test</value> #online只允许test用户提交作业
</property>
<property>
<name>yarn.scheduler.capacity.root.offlinea.acl_submit_applications</name>
<value>b1</value> #offline只允许b1用户提交作业
</property>
控制权限:
#queue-name=root
<property>
<name>yarn.scheduler.capacity.root.acl_administer_queue</name>
<value> </value> <!-- ACL继承性,父队列需控制权限-->
</property>
#queue-name=root.default
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>test,a1</value> #default队列的任务只允许test用户和a1用户停止
</property>
3. 执行生效
`yarn rmadmin -refreshQueues`
完整配置示例
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.scheduler.capacity.maximum-applications</name>
<value>10000</value>
<description>
Maximum number of applications that can be pending and running.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
<value>0.1</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
<description>
The ResourceCalculator implementation to be used to compare
Resources in the scheduler.
The default i.e. DefaultResourceCalculator only uses Memory while
DominantResourceCalculator uses dominant-resource to compare
multi-dimensional resources such as Memory, CPU etc.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,test1,test2</value>
<description>
The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>30</value>
<description>Default queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.capacity</name>
<value>30</value>
<description>test1 queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.capacity</name>
<value>40</value>
<description>test1 queue target capacity.</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.user-limit-factor</name>
<value>1</value>
<description>
Default queue user limit a percentage from 0.0 to 1.0.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>70</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.maximum-capacity</name>
<value>70</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.maximum-capacity</name>
<value>70</value>
<description>
The maximum capacity of the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.state</name>
<value>RUNNING</value>
<description>
The state of the default queue. State can be one of RUNNING or STOPPED.
</description>
</property>
<!---->
<property>
<name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.acl_submit_applications</name>
<value>*</value>
<description>
The ACL of who can submit jobs to the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.acl_administer_queue</name>
<value>*</value>
<description>
The ACL of who can administer jobs on the default queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.acl_application_max_priority</name>
<value>*</value>
<description>
The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test1.acl_application_max_priority</name>
<value>*</value>
<description>
The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.test2.acl_application_max_priority</name>
<value>*</value>
<description>
The ACL of who can submit applications with configured priority.
For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
</name>
<value>-1</value>
<description>
Maximum lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
disabled.
This will be a hard time limit for all applications in this
queue. If positive value is configured then any application submitted
to this queue will be killed after exceeds the configured lifetime.
User can also specify lifetime per application basis in
application submission context. But user lifetime will be
overridden if it exceeds queue maximum lifetime. It is point-in-time
configuration.
Note : Configuring too low value will result in killing application
sooner. This feature is applicable only for leaf queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.default-application-lifetime
</name>
<value>-1</value>
<description>
Default lifetime of an application which is submitted to a queue
in seconds. Any value less than or equal to zero will be considered as
disabled.
If the user has not submitted application with lifetime value then this
value will be taken. It is point-in-time configuration.
Note : Default lifetime can't exceed maximum lifetime. This feature is
applicable only for leaf queue.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.node-locality-delay</name>
<value>40</value>
<description>
Number of missed scheduling opportunities after which the CapacityScheduler
attempts to schedule rack-local containers.
When setting this parameter, the size of the cluster should be taken into account.
We use 40 as the default value, which is approximately the number of nodes in one rack.
Note, if this value is -1, the locality constraint in the container request
will be ignored, which disables the delay scheduling.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
<value>-1</value>
<description>
Number of additional missed scheduling opportunities over the node-locality-delay
ones, after which the CapacityScheduler attempts to schedule off-switch containers,
instead of rack-local ones.
Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
after 40+20=60 missed opportunities.
When setting this parameter, the size of the cluster should be taken into account.
We use -1 as the default value, which disables this feature. In this case, the number
of missed opportunities for assigning off-switch containers is calculated based on
the number of containers and unique locations specified in the resource request,
as well as the size of the cluster.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings</name>
<value></value>
<description>
A list of mappings that will be used to assign jobs to queues
The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
Typically this list will be used to map users to queues,
for example, u:%user:%user maps all users to queues with the same name
as the user.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
<value>false</value>
<description>
If a queue mapping is present, will it override the value specified
by the user? This can be used by administrators to place jobs in queues
that are different than the one specified by the user.
The default is false.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
<value>1</value>
<description>
Controls the number of OFF_SWITCH assignments allowed
during a node's heartbeat. Increasing this value can improve
scheduling rate for OFF_SWITCH containers. Lower values reduce
"clumping" of applications on particular nodes. The default is 1.
Legal values are 1-MAX_INT. This config is refreshable.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.application.fail-fast</name>
<value>false</value>
<description>
Whether RM should fail during recovery if previous applications'
queue is no longer valid.
</description>
</property>
<property>
<name>yarn.scheduler.capacity.workflow-priority-mappings</name>
<value></value>
<description>
A list of mappings that will be used to override application priority.
The syntax for this list is
[workflowId]:[full_queue_name]:[priority][,next mapping]*
where an application submitted (or mapped to) queue "full_queue_name"
and workflowId "workflowId" (as specified in application submission
context) will be given priority "priority".
</description>
</property>
<property>
<name>yarn.scheduler.capacity.workflow-priority-mappings-override.enable</name>
<value>false</value>
<description>
If a priority mapping is present, will it override the value specified
by the user? This can be used by administrators to give applications a
priority that is different than the one specified by the user.
The default is false.
</description>
</property>
</configuration>
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/65370.html