Spark 2.4 集群部署(on Yarn模式)
基本信息
系统版本:3.10.0-1062.9.1.el7.x86_64
JDK 版本:1.8.0_202
Hadoop:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gzScala:https://github.com/scala/scala/releases/tag/v2.12.9 页面下下载tar.gz的包Spark:https://mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz
Step 1:修改主机名称
hostnamectl set-hostname node01hostnamectl set-hostname node02hostnamectl set-hostname node03
Step 2:修改 hosts 文件
vim /etc/hosts
192.168.24.2 node01192.168.24.4 node02192.168.24.6 node03
Step 3:关闭防火墙,并禁止启动
systemctl stop firewalldsystemctl disable firewalld
Step 4:关闭 SELINUX
vim /etc/selinux/config将 SELINUX 配置项改为 SELINUX=disabled
Step 5:SSH 免密登录设置(所有节点)
-t rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keyschmod 700 ~/.sshchmod 600 ~/.ssh/authorized_keys
# 将 node01 的公钥 copy 到其他节点,仅在 node01 上执行即可;
ssh-copy-id node02ssh-copy-id node03
Step 6:创建目录
mkdir -pv /home/hadoopmkdir -pv /home/hadoop/work/tmp/dfs/namemkdir -pv /home/hadoop/work/tmp/dfs/datamkdir -pv /home/spark
Step 7:下载 hadoop tar 包并解压
cd /home/hadoopwget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gztar zxvf hadoop-2.7.7.tar.gz
Step 8:进入目录 hadoop-2.7.7/etc/hadoop,依次编辑 hadoop-env.sh、mapred-env.sh、yarn-env.sh 这三个配置文件,确保内容中 JAVA_HOME 配置为正确路径,如下:
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
Step 9:编辑 core-site.xml 文件,配置 configuration 节点,如下:
<configuration><property><name>fs.defaultFS</name><value>hdfs://node01:8020</value></property><property><name>hadoop.tmp.dir</name><value>/home/hadoop/work/tmp</value></property><property><name>dfs.namenode.name.dir</name><value>file://${hadoop.tmp.dir}/dfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>file://${hadoop.tmp.dir}/dfs/data</value></property></configuration>
Step 10:编辑 hdfs-site.xml 文件,配置 configuration节点,将 node02 配置成 Sendary Namenode,如下:
<configuration><property><name>dfs.namenode.secondary.http-address</name><value>node02:50090</value></property></configuration>
Step 11:编辑 slaves 文件,如下:
node01node02node03
Step 12:编辑 yarn-site.xml 文件,配置 configuration 节点,如下:
<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.resourcemanager.hostname</name><value>node01</value></property><property><name>yarn.log-aggregation-enable</name><value>true</value></property><property><name>yarn.log-aggregation.retain-seconds</name><value>106800</value></property><property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value></property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property></configuration>
Step 13:备份mapred-site.xml.template文件,将mapred-site.xml.template 重命名为 mapred-site.xml
cp mapred-site.xml.template mapred-site.xml.template.bakmv mapred-site.xml.template mapred-site.xml
Step 14:编辑 mapred-site.xml 文件,配置 configuration 节点,如下:
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>node01:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>node01:19888</value></property></configuration>
Step 15:同步 hadoop-2.7.7 目前到其他节点
scp -r hadoop-2.7.7 node02:/home/hadoopscp -r hadoop-2.7.7 node03:/home/hadoop
Step 16:格式化 HDFS
cd /home/hadoop/hadoop-2.7.7bin/hdfs namenode -format
Step 17:启动 Hadoop
# 启动hdfssh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh#启动yarnsh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh#启动日志服务/home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver
Step 18:下载 scala、spark 包并解压
cd /home/sparkhttps://github.com/scala/scala/releases/tag/v2.12.9 页面下载scala并将包上传至服务器 /home/spark目录;://mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz;tar -zxvf scala-2.12.9.tar.gztar -zxvf spark-2.4.7-bin-hadoop2.7.tgz
Step 19:编辑 spark 相关配置文件
cd spark-2.4.7-bin-hadoop2.7/confcp spark-env.sh.template spark-env.sh.template.bakmv spark-env.sh.template spark-env.sh
vim spark-env.sh 追加:
export SPARK_MASTER_IP=node01export SPARK_MASTER_PORT=7077export SPARK_EXECUTOR_INSTANCES=1export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_CORES=1export SPARK_WORKER_MEMORY=256Mexport SPARK_MASTER_WEBUI_PORT=8080export SPARK_CONF_DIR=/home/spark/spark-2.4.7-bin-hadoop2.7/confexport JAVA_HOME=/usr/java/jdk1.8.0_202-amd64export JRE_HOME=${JAVA_HOME}/jreexport HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
mv slaves.template slaves
vim slaves
node01node02node03
Step 20:将 scala-2.12.9、spark-2.4.7-bin-hadoop2.7 目录同步到其他节点;
scp -r scala-2.12.9 node02:/home/sparkscp -r scala-2.12.9 node03:/home/sparkscp -r spark-2.4.7-bin-hadoop2.7 node02:/home/sparkscp -r spark-2.4.7-bin-hadoop2.7 node03:/home/spark
Step 21:启动 Spark
sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh
Step 22:配置环境变量,并使之立即生效
#JAVAexport JAVA_HOME=/usr/java/jdk1.8.0_202-amd64export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport PATH=${JAVA_HOME}/bin:$PATH#SCALAexport SCALA_HOME=/home/spark/scala-2.12.9export PATH=${SCALA_HOME}/bin:$PATH#SPARKexport SPARK_HOME=/home/spark/spark-2.4.7-bin-hadoop2.7export PATH=${SPARK_HOME}/bin:$PATH#HADOOPexport HADOOP_HOME=/home/hadoop/hadoop-2.7.7export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATHexport HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
source /etc/profile
Step 23:简单配置一键启动脚本和关闭脚本方便启动与关闭
start-spark.shsh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh \&& sh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh \&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver \&& sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh
stop-spark.shsh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/stop-all.sh \&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh stop historyserver \&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-yarn.sh \&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-dfs.sh
Step 24:访问URL:http://192.168.24.2:8080
