Spark 2.4 集群部署(on Yarn模式)
基本信息
系统版本:3.10.0-1062.9.1.el7.x86_64
JDK 版本:1.8.0_202
Hadoop:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
Scala:https://github.com/scala/scala/releases/tag/v2.12.9 页面下下载tar.gz的包
Spark:https://mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz
Step 1:修改主机名称
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
Step 2:修改 hosts 文件
vim /etc/hosts
192.168.24.2 node01
192.168.24.4 node02
192.168.24.6 node03
Step 3:关闭防火墙,并禁止启动
systemctl stop firewalld
systemctl disable firewalld
Step 4:关闭 SELINUX
vim /etc/selinux/config
将 SELINUX 配置项改为 SELINUX=disabled
Step 5:SSH 免密登录设置(所有节点)
-t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
# 将 node01 的公钥 copy 到其他节点,仅在 node01 上执行即可;
ssh-copy-id node02
ssh-copy-id node03
Step 6:创建目录
mkdir -pv /home/hadoop
mkdir -pv /home/hadoop/work/tmp/dfs/name
mkdir -pv /home/hadoop/work/tmp/dfs/data
mkdir -pv /home/spark
Step 7:下载 hadoop tar 包并解压
cd /home/hadoop
wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
tar zxvf hadoop-2.7.7.tar.gz
Step 8:进入目录 hadoop-2.7.7/etc/hadoop,依次编辑 hadoop-env.sh、mapred-env.sh、yarn-env.sh 这三个配置文件,确保内容中 JAVA_HOME 配置为正确路径,如下:
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
Step 9:编辑 core-site.xml 文件,配置 configuration 节点,如下:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node01:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/work/tmp</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file://${hadoop.tmp.dir}/dfs/data</value>
</property>
</configuration>
Step 10:编辑 hdfs-site.xml 文件,配置 configuration节点,将 node02 配置成 Sendary Namenode,如下:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node02:50090</value>
</property>
</configuration>
Step 11:编辑 slaves 文件,如下:
node01
node02
node03
Step 12:编辑 yarn-site.xml 文件,配置 configuration 节点,如下:
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node01</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
Step 13:备份mapred-site.xml.template文件,将mapred-site.xml.template 重命名为 mapred-site.xml
cp mapred-site.xml.template mapred-site.xml.template.bak
mv mapred-site.xml.template mapred-site.xml
Step 14:编辑 mapred-site.xml 文件,配置 configuration 节点,如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node01:19888</value>
</property>
</configuration>
Step 15:同步 hadoop-2.7.7 目前到其他节点
scp -r hadoop-2.7.7 node02:/home/hadoop
scp -r hadoop-2.7.7 node03:/home/hadoop
Step 16:格式化 HDFS
cd /home/hadoop/hadoop-2.7.7
bin/hdfs namenode -format
Step 17:启动 Hadoop
# 启动hdfs
sh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh
#启动yarn
sh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh
#启动日志服务
/home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver
Step 18:下载 scala、spark 包并解压
cd /home/spark
https://github.com/scala/scala/releases/tag/v2.12.9 页面下载scala并将包上传至服务器 /home/spark目录;
//mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz; :
tar -zxvf scala-2.12.9.tar.gz
tar -zxvf spark-2.4.7-bin-hadoop2.7.tgz
Step 19:编辑 spark 相关配置文件
cd spark-2.4.7-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh.template.bak
mv spark-env.sh.template spark-env.sh
vim spark-env.sh 追加:
export SPARK_MASTER_IP=node01
export SPARK_MASTER_PORT=7077
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=256M
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_CONF_DIR=/home/spark/spark-2.4.7-bin-hadoop2.7/conf
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
export JRE_HOME=${JAVA_HOME}/jre
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
mv slaves.template slaves
vim slaves
node01
node02
node03
Step 20:将 scala-2.12.9、spark-2.4.7-bin-hadoop2.7 目录同步到其他节点;
scp -r scala-2.12.9 node02:/home/spark
scp -r scala-2.12.9 node03:/home/spark
scp -r spark-2.4.7-bin-hadoop2.7 node02:/home/spark
scp -r spark-2.4.7-bin-hadoop2.7 node03:/home/spark
Step 21:启动 Spark
sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh
Step 22:配置环境变量,并使之立即生效
#JAVA
export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#SCALA
export SCALA_HOME=/home/spark/scala-2.12.9
export PATH=${SCALA_HOME}/bin:$PATH
#SPARK
export SPARK_HOME=/home/spark/spark-2.4.7-bin-hadoop2.7
export PATH=${SPARK_HOME}/bin:$PATH
#HADOOP
export HADOOP_HOME=/home/hadoop/hadoop-2.7.7
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
source /etc/profile
Step 23:简单配置一键启动脚本和关闭脚本方便启动与关闭
start-spark.sh
sh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh \
&& sh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh \
&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver \
&& sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh
stop-spark.sh
sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/stop-all.sh \
&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh stop historyserver \
&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-yarn.sh \
&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-dfs.sh
Step 24:访问URL:http://192.168.24.2:8080