Run spark Hibench Test Container on yarn

Hadoop2.6新增了Docker Container Executor，这样可以用yarn来管理docker容器，本文记录了如何使用spark来运行Hibench相关的数据集

环境：

hadoop 3.2

docker 18.06.3-ce

这里先要有hdfs的，没有配置hdfs网上有很多教程，就不在这里说了

配置yarn-site.xml和container-executor.cfg

以下是我的yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/yarn/local</value>
</property>

<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/yarn/log</value>
</property>

<property>
<name>yarn.acl.enable</name>
<value>false</value>
</property>

<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>32768</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>32768</value>
</property>
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>32</value>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>32</value>
</property>

    <property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/home/hadoop/hadoop3.2/etc/hadoop:/usr/local/home/hadoop/hadoop3.2/share/hadoop/common/lib/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/common/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/hdfs:/usr/local/home/hadoop/hadoop3.2/share/hadoop/hdfs/lib/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/hdfs/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/mapreduce/lib/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/mapreduce/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/yarn:/usr/local/home/hadoop/hadoop3.2/share/hadoop/yarn/lib/*:/usr/local/home/hadoop/hadoop3.2/share/hadoop/yarn/*</value>
    </property>

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
    <description>
      Enable services rest api on ResourceManager.
    </description>
    <name>yarn.webapp.api-service.enable</name>
    <value>true</value>
</property>

<property>
  <name>yarn.nodemanager.container-executor.class</name>
  <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
</property>

<property>
 <name>yarn.nodemanager.docker-container-executor.exec-name</name>
  <value>docker -H=tcp://0.0.0.0:2375</value>
  <description>
     Name or path to the Docker client. The tcp socket must be
     where docker daemon is listening.
  </description>
</property>

<property>
     <name>yarn.nodemanager.linux-container-executor.path</name>
     <value>/usr/local/home/hadoop/hadoop3.2/bin/container-executor</value>
</property>

<property>
  <name>yarn.nodemanager.linux-container-executor.group</name>
  <value>hadoop</value>
</property>

<property>
    <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
    <value>false</value>
    <description>
      Whether all applications should be run as the NodeManager process' owner.
      When false, applications are launched instead as the application owner.
    </description>
</property>

<property>
        <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
        <value>/hadoop-yarn</value> 
</property>
<property>
        <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
        <value>true</value> 
</property>
<property>
        <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
        <value>/sys/fs/cgroup</value> 
</property>

<property>
    <description>The UNIX user that containers will run as when
    Linux-container-executor is used in nonsecure mode</description>
    <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name>
    <value>nobody</value>
</property>
                        
<property>
    <description>Comma separated list of runtimes that are allowed when using
    LinuxContainerExecutor.</description>
    <name>yarn.nodemanager.runtime.linux.allowed-runtimes</name>
    <value>default,docker</value>
    </property>
                        
<property>
    <description>This configuration setting determines the capabilities
    assigned to docker containers when they are launched. While these may not
    be case-sensitive from a docker perspective, it is best to keep these
    uppercase. To run without any capabilities, set this value to
    "none" or "NONE"</description>
    <name>yarn.nodemanager.runtime.linux.docker.capabilities</name>
    <value>CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,
SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE</value>
</property>
                        
<property>
    <description>This configuration setting determines if
    privileged docker containers are allowed on this cluster.
    The submitting user must be part of the privileged container acl and 
    must be part of the docker group or have sudo access to the docker command 
    to be able to use a privileged container. Use with extreme care.</description>
    <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed</name>
    <value>false</value>
</property>
                        
<property>
    <description>This configuration setting determines the submitting 
    users who are allowed to run privileged docker containers on this cluster. 
    The submitting user must also be part of the docker group or have sudo access
    to the docker command. No users are allowed by default. Use with extreme care. 
    </description>
    <name>yarn.nodemanager.runtime.linux.docker.privileged-containers.acl</name>
    <value> </value>
</property>
                        
<property>
    <description>The set of networks allowed when launching containers</description>
    <name>yarn.nodemanager.runtime.linux.docker.allowed-container-networks</name>
    <value>host,bridge</value>
</property>
                        
<property>
    <description>The network used when launching containers when no network is specified 
    in the request. This network must be one of the (configurable) set of allowed 
    container networks. The default is host, which may not be appropriate for multiple 
    containers on a single node, use bridge in that case. See docker networking for more.
    </description>
    <name>yarn.nodemanager.runtime.linux.docker.default-container-network</name>
    <value>host</value>
</property>
</configuration>

其中yarn.nodemanager.docker-container-executor.exec-name和yarn.nodemanager.container-executor.class配置container相关的配置，基本上面一些都是yarn的常规配置，下面一些是关于容器相关的配置

配完这一步之后就是container-executor.cfg

下面是我的container-executor.cfg配置

yarn.nodemanager.local-dirs=/home/yarn/local
yarn.nodemanager.log-dirs=/home/yarn/log
yarn.nodemanager.linux-container-executor.group=hadoop
banned.users=hdfs,mapred,bin
min.user.id=1000
[docker]
module.enabled=true
docker.binary=/usr/bin/docker
docker.allowed.capabilities=CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE,DAC_READ_SEARCH,SYS_PTRACE,SYS_ADMIN
docker.allowed.networks=bridge,host,none
docker.allowed.ro-mounts=/sys/fs/cgroup,/home/yarn/local,/etc/passwd,/etc/group
docker.allowed.rw-mounts=/home/yarn/local,/home/yarn/log
docker.privileged-containers.enabled=false
docker.trusted.registries=local,centos,hortonworks

这些配置中路径配置可以根据自己需求改，比如 yarn.nodemanager.local-dirs和yarn.nodemanager.log-dirs，这两个其实再yarn-site.xml已经配了，不过在这我也没删，但是像 docker.allowed.rw-mounts和 docker.allowed.ro-mounts这种是关于docker挂载的，保持原样为好

配完这两个配置文件，基本上配置算上大功告成，下面就是我最头疼的权限配置了

权限配置

spark container on yarn不可以使用root来提交任务，不然直接报错无法使用root权限来运行，需要先创建非超级用户

useradd -G hadoop yarn

然后设置文件夹权限

container-executor设置为root:hadoop权限，hadoop是与NM 属组相同的组，并且权限要设置为6050

chown root:hadoop bin/container-executor
chmod 6050 bin/container-executor

container-executor.cfg也是同样的用户组，权限设置为0400

chown root:hadoop etc/hadoop/container-executor.cfg
chmod 0400 etc/hadoop/container-executor.cfg

刚刚 yarn.nodemanager.local-dirs 和 yarn.nodemanager.log-dirs 所配置的目录设置为yarn:hadoop

运行

上述配置完成后，启动yarn，就可以使用yarn来管控spark容器，启动命令示例

MOUNTS="/etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro"
DOCKER_CLIENT_CONFIG=/home/compare/config.json

su - yarn -c "export SPARKBENCH_PROPERTIES_FILES=/usr/local/home/hibench/hibench/report/wordcount/spark/conf/sparkbench/sparkbench-template.conf && \
$SPARK_HOME/bin/spark-submit \
--master yarn \
--properties-file $SPARK_HOME/conf/spark-defaults.conf \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1 \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/spark:v1 \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_CLIENT_CONFIG=$DOCKER_CLIENT_CONFIG \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=$MOUNTS \
--class com.intel.hibench.sparkbench.micro.ScalaWordCount \
/usr/local/home/hibench/hibench/sparkbench/assembly/target/sparkbench-assembly-7.1.1-dist.jar \
hdfs://192.168.0.40:9000/HiBench/Wordcount/53687091200/Input \
hdfs://192.168.0.40:9000/HiBench/Wordcount/53687091200/Output"

使用yarn用户来提交命令，local/spark:v1使用spark提供给k8s的镜像就可以，但是我遇到了一个问题，出现了Container exited with a non-zero exit code 1. Error file: prelaunch.err. "find":unrecognized: -ls 错误，我的解决办法就是再Dockerfile中加入一个findutils

RUN set -ex && \
    apk upgrade --no-cache && \
    ln -s /lib /lib64 && \
    apk add --no-cache bash tini libc6-compat linux-pam nss findutils && \
    mkdir -p /opt/spark && \
    mkdir -p /opt/spark/work-dir && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd

COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY ${k8s_tests} /opt/spark/tests
COPY data /opt/spark/data

ENV SPARK_HOME /opt/spark

WORKDIR /opt/spark/work-dir

ENTRYPOINT [ "/opt/entrypoint.sh" ]

然后重新构建镜像docker build -t local/spark:v1 -f kubernetes/dockerfiles/spark/Dockerfile .

解决这个问题之后再遇见了Caused by: javax.security.auth.login.LoginException: java.lang.NullPointerException: invalid null input: name，这个问题的原因是Hadoop UserGroupInformation类无权访问主机系统中的用户权限。通过挂载 /etc/passwd:/etc/passwd:ro,/etc/group:/etc/group:ro 就可以了

挂载这个又出现了Shell error output: Configuration does not allow docker mount '/etc/passwd:/etc/passwd:ro', realpath=/etc/passwd Error constructing docker command, docker error code=14, error message='Invalid docker read-only mount' 错误，这个错误就是需要更改container-executor.cfg的 docker.allowed.ro-mounts ，将这个目录加进去就行了，然后重新启动yarn，重新提交任务