vlambda博客
学习文章列表

Centos 7版本下离线安装Hadoop完全分布式环境

Centos 7版本下离线安装Hadoop完全分布式环境



1、编辑hosts文件


[root@master ~]# vim /etc/hosts


2、将master中的/etc/hosts文件通过scp传输到各个节点中

[root@master ~]# scp /etc/hosts [email protected]:/etc/hosts

The authenticity of host '192.168.5.128 (192.168.5.128)' can't be established.

ECDSA key fingerprint is SHA256:WIen0BimPcoPziD6DYeAJzV2JeHQSBZVHosXWrczvaU.

ECDSA key fingerprint is MD5:b5:9b:b6:b7:ee:3a:75:2a:98:60:65:d2:69:43:84:02.

Are you sure you want to continue connecting (yes/no)? yes

Warning:Permanently added '192.168.5.128' (ECDSA) to the list of known hosts.

[email protected]'s password:

hosts      100%  221   137.2KB/s   00:00  

[root@master ~]# scp /etc/hosts [email protected]:/etc/hosts


可以进行如下测试:

[hadoop@master ~]$ ping slave1
PING slave1 (192.168.5.129) 56(84) bytes of data.
64 bytes from slave1 (192.168.5.129): icmp_seq=1 ttl=64 time=1.67 ms
64 bytes from slave1 (192.168.5.129): icmp_seq=2 ttl=64 time=0.580 ms


[hadoop@master ~]$ ping slave2
PING slave2 (192.168.5.130) 56(84) bytes of data.
64 bytes from slave2 (192.168.5.130): icmp_seq=1 ttl=64 time=2.58 ms
64 bytes from slave2 (192.168.5.130): icmp_seq=2 ttl=64 time=1.19 ms


3、配置SSH免密登录


由于搭建的是完全分布式环境,可以使用3台机器完成环境搭建,而环境搭建中又需要3台机器互相通信,如果不采用免密互信,每次都需要输入用户名和密码,非常麻烦。


(1)生成公钥和私钥

                 [hadoop@master ~]$ ssh-keygen

                 回车确认即可


(2)将公钥进行存储

什么是authorized_keys :authorized_keys 是linux 操作系统下,专门用来存放公钥的地方,只要公钥放到了服务器的正确位置,并且拥有正确的权限,你才可以通过你的私钥,免密登录linux服务器。


[hadoop@master ~]$ cat .ssh/id_rsa.pub >> .ssh/authorized_keys
//将.ssh/id_rsa.pub中的公钥存到authorized_keys中


[hadoop@master ~]$ chmod 600 .ssh/authorized_keys 

//修改authorized_keys的权限


(3)将.ssh文件夹传输到slave1和slave2中


[hadoop@master ~]$ scp -r .ssh hadoop@slave1:~/

[hadoop@master ~]$ scp -r .ssh hadoop@slave2:~/


测试是否能免密登录:

[hadoop@master ~]$ ssh slave1

Last login: Fri Mar 19 02:40:48 2021 from master

[hadoop@master ~]$ ssh slave2

Last login: Fri Mar 19 02:15:56 2021

能免密登录,说明ssh配置成功


4、将jre-8u281-linux-x64.tar.gz和hadoop-3.2.2.tar.gz两个软件包传输到master、slave1和slave2的/home/hadoop/目录,解压缩。


5、由于Centos 7安装的Minimal版本,无法运行jps,所以,需要安装java-1.8.0-openjdk相关组件。

 安装方法:yum -y install java-1.8.0-openjdk*即可


6.添加hadoop的PATH环境变量,方便运行hadoop程序

[root@master ~]# su - hadoop

[hadoop@master ~]$ vim .bash_profile

Centos 7版本下离线安装Hadoop完全分布式环境


7、将环境配置文件分发到slave1、slave2节点

[hadoop@master ~]$ scp -r .bash_profile [email protected]:~/


[hadoop@master ~]$ scp -r .bash_profile [email protected]:~/


8、编辑hadoop配置文件


(1)[hadoop@master hadoop]$ vim core-site.xml

<configuration>

        <property>

                <name>fs.defaultFS</name>

                <value>hdfs://master:9000</value>

        </property>

        <property>

                <name>hadoop.tmp.dir</name>

                <value>file:/home/hadoop/tmp</value>

        </property>

 

</configuration>

 

(2)[hadoop@master hadoop]$ vim hdfs-site.xml

<configuration>

        <property>

                <name>dfs.namenode.name.dir</name>

                <value>file:/home/hadoop/tmp/dfs/name</value>

        </property>

 

        <property>

                <name>dfs.datanode.data.dir</name>

                <value>file:/home/hadoop/tmp/dfs/data</value>

        </property>

                

                

        <property>

                <name>dfs.namenode.secondary.http-address</name>

                <value>master:50090</value>

        </property>

 

 

        <property>

                <name>dfs.replication</name>

                <value>1</value>

        </property>

</configuration>

 

(3)[hadoop@master hadoop]$ vim yarn-site.xml

<configuration>

 

<!-- Site specific YARN configuration properties -->

        <property>

                <name>yarn.resoucemanager.hostname</name>

                <value>master</value>

        </property>

 

        <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

        </property>

</configuration>


(4)[hadoop@master hadoop]$ vim mapred-site.xml

<configuration>


<!-- Site specific YARN configuration properties -->

        <property>

                <name>mapreduce.framework.name</name>

                <value>yarn</value>

        </property>

</configuration>


(5)[hadoop@master hadoop]$ vim workers

slave1

slave2

9、将java和hadoop文件夹传至slave1和slave2主机上

[hadoop@master hadoop]$ scp -r jre1.8.0_281/ hadoop@slave1:~/

[hadoop@master hadoop]$ scp -r jre1.8.0_281/ hadoop@slave2:~/


[hadoop@master hadoop]$ scp -r /home/hadoop/hadoop-3.2.2 hadoop@slave1:~/

[hadoop@master hadoop]$ scp -r /home/hadoop/hadoop-3.2.2 hadoop@slave2:~/


10、将java和hadoop文件夹传至slave1和slave2主机上


[hadoop@master hadoop]$ scp -r jre1.8.0_281/ hadoop@slave1:~/

[hadoop@master hadoop]$ scp -r jre1.8.0_281/ hadoop@slave2:~/

 

[hadoop@master hadoop]$ scp -r /home/hadoop/hadoop-3.2.2 hadoop@slave1:~/

[hadoop@master hadoop]$ scp -r /home/hadoop/hadoop-3.2.2 hadoop@slave2:~/


11、如出现以下提示说明格式化文件系统成功

2021-03-21 01:59:10,350 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.

12、启动hadoop

[hadoop@master hadoop]$ start-dfs.sh

Starting namenodes on [master]

master: Warning: Permanently added 'master,192.168.5.128' (ECDSA) to the list of known hosts.

master: ERROR: JAVA_HOME is not set and could not be found.

Starting datanodes

slave2: ERROR: JAVA_HOME is not set and could not be found.

slave1: ERROR: JAVA_HOME is not set and could not be found.

Starting secondary namenodes [master]

master: ERROR: JAVA_HOME is not set and could not be found.


如果提示以上错误,说明hadoop-env.sh环境中未配置JAVA_HOME

编辑以下文件:

[hadoop@master ~]$ vim hadoop-3.2.2/etc/hadoop/hadoop-env.sh


添加以下文件

 export JAVA_HOME=/home/hadoop/jre1.8.0_281

 

并将此文件传到slave1和slave2中

[hadoop@master ~]$ scp hadoop-3.2.2/etc/hadoop/hadoop-env.sh hadoop@slave1:/home/hadoop/hadoop-3.2.2/etc/hadoop/hadoop-env.sh

hadoop-env.sh                                                              100%   16KB   4.6MB/s   00:00    

[hadoop@master ~]$ scp hadoop-3.2.2/etc/hadoop/hadoop-env.sh hadoop@slave2:/home/hadoop/hadoop-3.2.2/etc/hadoop/hadoop-env.sh

hadoop-env.sh 

100%   16KB   1.7MB/s   00:00


再启动hadoop

[hadoop@master ~]$ start-dfs.sh

Starting namenodes on [master]

Starting datanodes

slave1: WARNING: /home/hadoop/hadoop-3.2.2/logs does not exist. Creating.

slave2: WARNING: /home/hadoop/hadoop-3.2.2/logs does not exist. Creating.

Starting secondary namenodes [master]


此时已经成功,下面使用jps查看运行进程

[hadoop@master ~]$ jps

10178 Jps

9840 NameNode

10060 SecondaryNameNode

启动YARN进程:

[hadoop@master ~]$ start-yarn.sh

Starting resourcemanager

Starting nodemanagers

[hadoop@master ~]$ jps

9840 NameNode

10610 Jps

10310 ResourceManager

10060 SecondaryNameNode


启动MapReduce JobHistory Server,并在指定服务器上以mapred运行:

[hadoop@master ~]$ mapred --daemon start historyserver

[hadoop@master ~]$ jps

9840 NameNode

10722 Jps

10310 ResourceManager

10666 JobHistoryServer

10060 SecondaryNameNode


停止以上进程使用方法:

[hadoop@master ~]$ mapred --daemon stop historyserver

[hadoop@master ~]$ stop-yarn.sh

[hadoop@master ~]$ stop-dfs.sh


13、NameNode启动后界面

Centos 7版本下离线安装Hadoop完全分布式环境


14、ResourceManager启动后WEB界面


15、MapReduce JobHistory Server启动后WEB界面

文字提供:王世刚
编辑排版:祝润丽、刘雨轩

技术指导:袁鸿琴