华为云Centos7搭建Hadoop3分布式集群

来自CloudWiki
跳转至: 导航搜索

集群规划

节点名称【主机名】	IP
master	170.158.10.101
slave1	170.158.10.102
slave2	170.158.10.103

环境准备

修改主机名

  • 修改主机名

在master,slave1,slave2上分别执行

# hostnamectl set-hostname master
# hostnamectl set-hostname slave1
# hostnamectl set-hostname slave2
su查看


Hostname.png

修改主机配置文件

在master上执行

vim /etc/hosts

# 添加以下内容
192.168.0.172 master
119.3.228.195 slave1
123.249.32.173 slave2

在slave1 和slave2上执行类似配置,规则为

在文件/etc/hosts中设置ip与域名的匹配时:

1.在本机上的操作,都要设置成内网ip

2.其它机器上的操作,要设置成外网ip

那么具体的解决办法就是:

1.在Master服务器上,要将自己的ip设置成内网ip,而将另一台Slave服务器的ip设置成外网ip;

2.同样的在Slave服务器上,要将自己的ip设置成内网ip,而将另一台Master服务器的ip设置成外网ip。

参考链接:https://blog.csdn.net/xiaosa5211234554321/article/details/119627974

创建非root用户

在master,slave1,slave2上分别执行

# 2、创建用户,在root用户下
useradd hadoop
passwd hadoop
 
 

给普通用户添加sudo执行权限,且执行sudo不需要输入密码

chmod -v u+w /etc/sudoers

mode of ‘/etc/sudoers’ changed from 0440 (r--r-----) to 0640 (rw-r-----)

​ sudo vi /etc/sudoers

在%wheel  ALL=(ALL)       ALL一行下面添加如下语句:
hadoop    ALL=(ALL)       NOPASSWD: ALL

​ chmod -v u-w /etc/sudoers

mode of ‘/etc/sudoers’ changed from 0640 (rw-r-----) to 0440 (r--r-----)

关闭防火墙

在master,slave1,slave2上分别执行

systemctl stop firewalld

ssh免密登录

在master,slave1,slave2上以hadoop用户身份分别执行

ssh-keygen -t rsa

执行命令后,连续敲击三次回车键

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:DVFWqHDH+Hb+ThEissWHNGNl0NbMpDPJvyTzrZO02/Y root@master
The key's randomart image is:
+---[RSA 2048]----+
|        .+O*+=.  |
|      . o*+*oo+  |
|       oo+=.O .  |
|        .*oo.= . |
|        S..oo +  |
|            .=.+ |
|             o+o.|
|             .=o.|
|             .++E|
+----[SHA256]-----+

在master,slave1,slave2上以hadoop用户身份分别执行

执行ssh-copy-id命令,执行后,根据提示输入yes,再输入机器登录密码

ssh-copy-id master

ssh-copy-id slave1

ssh-copy-id slave2

安装JDK和hadoop

卸载系统自带的JDK

在master,slave1,slave2上分别执行:

# 查看系统自带的jdk
rpm -qa | grep jdk
 
# 卸载找到的jdk
yum -y remove 找到的jdk
 
# 或者使用以下的命令删除
rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps 
 

 

安装jdk

在master,slave1,slave2上分别执行:

安装jdk,安装包可以到官网进行下载:https://www.oracle.com/java/technologies/downloads/#java8-windows


新建软件安装目录

在/home/hadoop目录下,新建一个soft目录,当作安装的软件。

mkdir soft

解压JDK到soft目录

cd /home/hadoop/installfile/

tar -zxvf jdk-8u351-linux-x64.tar.gz -C /home/hadoop/soft


配置环境变量:

在master,slave1,slave2上分别执行

vi /etc/profile:

#JDK1.8
export JAVA_HOME=/home/hadoop/soft/jdk1.8.0_351
export PATH=$PATH:$JAVA_HOME/bin

source /etc/profile

java -version 有如下输出 说明java安装成功。

java version "1.8.0_351"
Java(TM) SE Runtime Environment (build 1.8.0_351-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.351-b10, mixed mode)

安装hadoop

在master,slave1,slave2上分别执行:

cd /home/hadoop/installfile

wget http://archive.apache.org/dist/hadoop/core/hadoop-3.1.3/hadoop-3.1.3.tar.gz

tar -zxvf hadoop-3.1.3.tar.gz -C ~/soft

配置环境变量:

vi /etc/profile:

#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/soft/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

source /etc/profile

验证是否安装成功:

hadoop version

Hadoop 3.1.3
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579
Compiled by ztang on 2019-09-12T02:47Z
Compiled with protoc 2.5.0
From source with checksum ec785077c385118ac91aadde5ec9799
This command was run using /root/soft/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar

配置hadoop

在master,slave1,slave2上分别执行:

配置core-site.xml

进入hadoop配置目录

cd $HADOOP_HOME/etc/hadoop

在<configuration> 和</configuration>之间添加如下内容:

    <!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020</value>
    </property>
    <!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/soft/hadoop-3.1.3/data</value>
    </property>
 
    <!-- 配置HDFS网页登录使用的静态用户为hadoop -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>hadoop</value>
    </property>

    <!-- 配置该hadoop(superUser)允许通过代理访问的主机节点 -->
    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <!-- 配置该hadoop(superUser)允许通过代理用户所属组 -->
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>
    <!-- 配置该hadoop(superUser)允许通过代理的用户-->
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>

配置hdfs-site.xml

vi hdfs-site.xml

    <!-- nn web端访问地址-->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>master:59998</value>
    </property>
    <!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:9868</value>
    </property>

配置yarn-site.xml

vi yarn-site.xml:

<!-- Site specific YARN configuration properties -->
    <!-- 指定MR走shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        </property>

    <!-- 指定ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>slave1</value>
    </property>

    <!-- 环境变量的继承 -->
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

    <!-- yarn容器允许分配的最大最小内存 -->
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
    </property>

    <!-- yarn容器允许管理的物理内存大小 -->
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>

    <!-- 关闭yarn对物理内存和虚拟内存的限制检查 -->
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
        <!-- 开启日志聚集功能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
 
    <!-- 设置日志聚集服务器地址 -->
    <property>
        <name>yarn.log.server.url</name>
        <value>http://master:19888/jobhistory/logs</value>
    </property>
 
    <!-- 设置日志保留时间为7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>

配置mapred-site.xml

vi mapred-site.xml:

    <!-- 指定MapReduce程序运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!-- 历史服务器端地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
 
    <!-- 历史服务器web端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>

配置workers

vi workers

master
slave1
slave2

注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行

分发hadoop配置文件到slave1、slave2节点

使用scp命令

格式化文件系统

在master机器上,任意目录输入 hdfs namenode -format 格式化namenode,第一次使用需格式化一次,之后就不用再格式化,如果改一些配置文件了,可能还需要再次格式化

有如下输出 说明成功:

2022-11-27 09:46:02,851 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2006367298-119.3.224.141-1669513562846
2022-11-27 09:46:02,863 INFO common.Storage: Storage directory /root/soft/hadoop-3.1.3/data/dfs/name has been successfully formatted.
2022-11-27 09:46:02,883 INFO namenode.FSImageFormatProtobuf: Saving image file /root/soft/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2022-11-27 09:46:02,942 INFO namenode.FSImageFormatProtobuf: Image file /root/soft/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
2022-11-27 09:46:02,952 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2022-11-27 09:46:02,955 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2022-11-27 09:46:02,955 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/119.3.224.141
************************************************************/

集群启动

启动hdfs

在master机器上执行启动hdfs命令

start-dfs.sh

WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [master]
Last login: Sun Nov 27 10:31:48 CST 2022 from 112.36.201.71 on pts/0
Starting datanodes
Last login: Sun Nov 27 10:32:06 CST 2022 on pts/0
Starting secondary namenodes [slave2]
Last login: Sun Nov 27 10:32:09 CST 2022 on pts/0

如果出现错误ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation

在/home/hadoop/soft/hadoop-3.1.3/sbin/start-dfs.sh 和stop-dfs.sh中添加:

HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

如果出现错误master: ERROR: Cannot set priority of namenode process 5830

可能是由于端口占用所致,也可能是网络通信所致,检查/etc/hosts是否如前面那样配置:

查看日志/home/hadoop/soft/hadoop-3.1.3/logs/hadoop-root-namenode-master.log

2022-11-27 10:04:20,754 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: master:9870
        at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1213)
        at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1235)
        at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1294)
        at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1149)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:181)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:881)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:703)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:949)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:922)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1688)
:$

启动历史服务器

根据之前的配置,

由master机器上执行:

mapred --daemon start historyserver

启动yarn

根据之前的配置,

由slave1机器上执行启动yarn命令

start-yarn.sh

Starting resourcemanager
Last login: Sun Nov 27 10:43:05 CST 2022 from 112.36.201.71 on pts/0
Starting nodemanagers
Last login: Sun Nov 27 10:53:56 CST 2022 on pts/0

如果出现错误

ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

在/home/hadoop/soft/hadoop-3.1.3/sbin/start-yarn.sh 和stop-yarn.sh上上方分别添加:

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

结果验证

进程验证

分别在不同机器执行jps命令

  1. master节点有NameNode、master或其他节点上有SecondNameNode说明启动成功
  2. slave1和slave2有DataNode
  3. slave1有ResourceManager和NodeManager

master上执行jps:

[root@master ~]# jps

14054 JobHistoryServer
14310 Jps
13575 NameNode
14169 NodeManager
794 WrapperSimpleApp
13755 DataNode

[root@slave1 ~]# jps

4017 DataNode
789 WrapperSimpleApp
4743 Jps
4394 NodeManager
4236 ResourceManager

[root@slave2 ~]# jps

5440 Jps
5111 DataNode
5303 NodeManager
5212 SecondaryNameNode
751 WrapperSimpleApp

浏览器验证

http://119.3.224.141:59998/ (119.3.224.141为master节点公网IP,验证前 需在华为云安全组上放通59998端口)

Python2022112801.png

参考文档:

[1] https://blog.csdn.net/weixin_52851430/article/details/124499792

[2] https://blog.csdn.net/qq_42881421/article/details/123900255

[3] https://blog.csdn.net/zhouzhiwengang/article/details/94549964