vlambda博客
学习文章列表

hdfs和yarn高可用对比

序言

   总有一天你会笑着说出曾经令你痛苦的事情,毕竟有些东西虽然不是你想要的,但是却是你自找的,表面上是无奈,实际上是懒得去做选择,成功的路只有一条,而失败的路则是各种各样的原因。


     得不到的时候念念不忘,得到的时候,却不珍惜,这到底是为什么呢?是忘记了出发的初心还是产生了新的欲望而反被其折磨?

高可用

   1 高可用架构对比

    HDFS的出现,就是为了解决海量数据的存储问题,从而采用分布式架构存储文件,将一个大文件按照block块来切分,然后分布在不同的机器,不同的机架中,数据节点能水平扩容,从而能海纳百川,存储海量数据。

    HDFS是为了存储数据的,从而要保证数据的可靠性,从而就有了datanode数据节点的三副本机制,而且在数据写入的时候,是流水线的方式写入,也就是正常情况下三节点数据写入成功才返回客户端成功,特殊情况下,写入一个也成,毕竟她自带了副本复制机制,也就是当副本数不满足设定的时候,会找到距离近的,负载低的,把数据再复制过去。

    HDFS是为了支持海量数据的分析计算的,就像MapReduce程序,文件多副本存储,也就意味着当同一份数据被三个任务跑的时候,可以分布在三台机器上,从而充分的发挥机器的算力。

    HDFS是分布式存储的,从而需要一个相当于字典的索引数据,有什么数据,有多少块,权限是啥,用户是啥,从而就有了namenode,既然有了名称服务器,那就意味着要持久化存储,需要保存相关的一些数据,保存的就是fsimage和edit日志信息,在客户端上传数据的时候,将操作日志记录在edit中,然后返回给客户端,而fsimage则是相当于内存的数据,可以理解为基线,像基准测试一样。

    如上图所示,可以看到很多的组件,包括zkfc,还有QJM集群,再看看yarn集群的高可用。

hdfs和yarn高可用对比

    对比一下就会看到,yarn集群的高可用架构比hdfs的要简单太多了,没有zkfc,没有qjm集群,只需要一个zk集群来负责选举出active的resourcemanager就好了。


    为什么差别这么大?这就是持久化数据的高可用和无状态高可用的区别了,hdfs的namenode要保持高可用,必须要保证数据同步,从而需要一个共享存储QJM来存放edits日志,然后同步到standby的节点上去,而对于resourcemanager来说,并不需要持久化啥数据,也就是无状态的,就像容器一样,直接删除,再创建一个完全没问题,所以差别来说,就是因为需要保存一些数据,这就是有状态和无状态之分。


    无状态的可以理解为结婚了,没有娃,而有状态的你可以理解为结婚了有了娃,那有了娃怎么办,你得有人看着吧,说分手就分手,对于无状态是可以的,对于有状态的你得找个人看着,就是一个standby了,而另外一个负责赚钱养家,那就是active了,最怕的就是两个都去赚钱了,然后都active了,俗称脑裂split brain,这个时候一般直接打死一个,让你没事就知道赚钱,打死的那个就是standby,只要看着孩子就成,如果两个都看着孩子,就是两个都不去赚钱了,也就是都是standby,其实还好,只是暂时没钱,不会孩子没人管。。。对于无状态的来说,其实还好,都出去赚钱,最多就是钱多了,也就是执行的任务数量多了点,相当于任务重跑了一下,可能会有数据重复,只要任务设计的好,就不会出现这种问题了,那要是两个都不去赚钱,变成了standby,那么就只能喝西北风了,毕竟刚结婚,还有一大堆的task在那等着需要资源resource呢。

    2 近看hdfs

    近看一朵花,远看豆腐渣,很多东西,看的太深,就忘记了全局层面的东西,埋头看路,低头看天,啥都没有。。。

#数据节点存储的数据,包括block块数据,还有数据的校验码[root@KEL subdir11]# pwd/$HADOOP_HOME/$DATADIR/dfs/data/current/BP-184102405-192.168.1.10-1612873956948/current/finalized/subdir0/subdir11[root@KEL subdir11]# cat blk_10737448401,1613991123,admin,admin[root@KEL subdir11]# cat blk_1073744840_4016.meta ʗE[root@KEL subdir11]# file blk_1073744840_4016.meta blk_1073744840_4016.meta: raw G3 data, byte-padded

    在查看数据节点存储数据的时候,需要注意的是,这些数据块和校验信息并不会存储在namenode里面,这个是datanode和namenode进行通信获取,在启动的时候,会统一汇报,这个也是所谓的安全模式safemode,此时你只能查,不能修改增加元数据。

    再看一下namenode保存的内容:

 #namenode保存的内容,包括edits日志和fsimage  edits_0000000000000053943-0000000000000053944 edits_inprogress_0000000000000053945  fsimage_0000000000000053730 fsimage_0000000000000053730.md5 fsimage_0000000000000053930 fsimage_0000000000000053930.md5 edits_0000000000000013106-0000000000000013107 seen_txid edits_0000000000000013108-0000000000000013109 VERSION63 edits_0000000000000013110-0000000000000013111[root@KEL current]# pwd/$HADOOP_HOME/$DATA_DIR/dfs/name/current[root@KEL current]# file fsimage_0000000000000053730fsimage_0000000000000053730: data[root@KEL current]# file fsimage_0000000000000053730.md5 fsimage_0000000000000053730.md5: ASCII text[root@KEL current]# cat fsimage_0000000000000053730.md5 2c8359248cbcc504dca7e3020f8bb309 *fsimage_0000000000000053730

    可以看到其中有大量的edtis文件,用来记录相关的操作信息,edits的可以理解为历史的,当前正在使用的inprogress。

#使用lsof可以查看当前进程占用的文件java    2560 root  233u   /edits_inprogress_0000000000000053951[root@KEL current]# jps2560 NameNode[root@KEL current]# lsof -p 2560|grep -v jar

    可以使用命令查看fsimage和edits的内容:

#将fsimage转换成xml[root@KEL current]# hdfs oiv -p xml -i fsimage_0000000000000053730 -o fsimage.xml[root@KEL current]# vim fsimage.xml  <?xml version="1.0"?> <fsimage> <NameSection> <genstampV1>1000</genstampV1> <genstampV2>1002</genstampV2> <genstampV1Limit>0</genstampV1Limit> <lastAllocatedBlockId>1073741826</lastAllocatedBlockId> <txid>37</txid> </NameSection> <INodeSection> <lastInodeId>16400</lastInodeId> <inode> <id>16385</id> <type>DIRECTORY</type> <name></name> <mtime>1392772497282</mtime> <permission>theuser:supergroup:rwxr-xr-x</permission> <nsquota>9223372036854775807</nsquota> <dsquota>-1</dsquota> </inode> ...remaining output omitted..

    查看edits文件内容:

#将edits log转换成xml格式查看,这个里面没内容[root@KEL current]# hdfs oev -p xml -i edits_0000000000000013122-0000000000000013123 -o edits.xml[root@KEL current]# vim edits.xml [root@KEL current]# cat edits.xml <?xml version="1.0" encoding="UTF-8"?><EDITS> <EDITS_VERSION>-63</EDITS_VERSION> <RECORD> <OPCODE>OP_START_LOG_SEGMENT</OPCODE> <DATA> <TXID>13122</TXID> </DATA> </RECORD> <RECORD> <OPCODE>OP_END_LOG_SEGMENT</OPCODE> <DATA> <TXID>13123</TXID> </DATA> </RECORD></EDITS>

    查看journal node保存的内容:

#保存的都是edits文件,active写入,standby的读出[root@KEL current]# cat last-promised-epoch 78[root@KEL current]# cat last-writer-epoch 78[root@KEL current]# ls -l edits_*|wc -l5652

    在高可用架构中,zkfc其实就是namenode的zookeeper连接客户端,当namenode进程挂了之后,zkfc进程是第一时间知道的,然后就执行fence程序,把namenode的状态设置为standby,但是当zkfc进程挂了呢,那就要等一段时间了,因为zkfc只负责自己namenode节点的生杀大权。

    在进行高可用搭建的时候,还需要进行格式化,也就是在zk上创建一个节点。也可以看到保存在zk上面的内容:

#使用zk客户端连接查看,看到其中保存的active节点[root@KEL bin]# ./zkCli.sh -server localhost:3001[zk: localhost:3001(CONNECTED) 4] get /hadoop-ha/ns/ActiveStandbyElectorLocknsnn1KEL �F(�>cZxid = 0x2f00000007ctime = Sat Mar 06 01:43:02 CST 2021mZxid = 0x2f00000007mtime = Sat Mar 06 01:43:02 CST 2021pZxid = 0x2f00000007cversion = 0dataVersion = 0aclVersion = 0ephemeralOwner = 0x2000018c99c0000dataLength = 20numChildren = 0[zk: localhost:3001(CONNECTED) 5] get /hadoop-ha/ns/ActiveBreadCrumb        nsnn1KEL �F(�>cZxid = 0x300000008ctime = Tue Feb 09 20:33:49 CST 2021mZxid = 0x2f00000008mtime = Sat Mar 06 01:43:03 CST 2021pZxid = 0x300000008cversion = 0dataVersion = 148aclVersion = 0ephemeralOwner = 0x0dataLength = 20numChildren = 0

    3 namenode页面信息

    在namenode的界面上,显示很多需要关注的信息,对于运维来说,关注这些信息也是比较多的,毕竟还是关注底层架构。

hdfs和yarn高可用对比

    简单概述信息包括:是否是安全模式,安全情况,具有的文件和目录,块数量,这个地方可以简单判断下文件的大小,毕竟如果hdfs存储大量的小文件,会消耗很多namenode的内存,而且在进行处理的时候,寻找相应的块信息,性能也会受到影响。hdfs是java写的,从而显示了堆栈占用的内存,下面则是一个概览,配置的容量大小,分布式文件系统使用的容量,非DFS使用的容量,块池使用量,数据盘使用的占比。毕竟是一个分布式存储系统,从而关注各种容量使用量,这个界面上显示non dfs used,这个还是蛮不错的,有的时候是因为其他的数据占用了空间,导致dfs的空间不足。

hdfs和yarn高可用对比

    这个显示了QJM的相关信息,也就是QJM运行在哪些机器上面,而且还显示了目前是哪个edit文件生效,相关事务id号(和standby显示的信息不一致,standby只显示QJM)。后面是namenode的存储空间,存储的类型是image和edits。

hdfs和yarn高可用对比

    datanode就显示相关节点的信息,占用的容量大小等,是否有磁盘损坏,decimissioning表示退役的节点,就像有的需要下线维修或者替换机器,主要是扩容和缩容可能会出现。

#hdfs的管理命令,可以查看帮助手册[root@KEL bin]# hdfs dfsadmin -hh: Unknown commandUsage: hdfs dfsadminNote: Administrative commands can only be run as the HDFS superuser

hdfs和yarn高可用对比

    主要报告磁盘的损坏信息:

#datanode的日志信息2021-03-06 01:23:34,954 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to KEL1/192.168.1.99:90002021-03-06 01:23:34,955 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to KEL/192.168.1.10:9000. Exiting.org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:285) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1371) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1323) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802) at java.lang.Thread.run(Thread.java:748)2021-03-06 01:23:34,955 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid unassigned) service to KEL/192.168.1.10:90002021-03-06 01:23:35,060 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid unassigned)2021-03-06 01:23:37,061 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode2021-03-06 01:23:37,062 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 02021-03-06 01:23:37,063 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

    损坏后,会导致datanode不提供服务。

hdfs和yarn高可用对比

    快照信息,快照是只读的,特殊时间点的拷贝信息,用来进行备份使用,进行一些用户错误和灾难恢复,其实开启回收站也是蛮好的一个功能,一般在分布式文件系统的使用的时候,都会开启这个功能,防止误删,要不然删除就找不回来了,在使用的时候,需要先允许创建snapshot。

[root@KEL hadoop-2.7.2]# hdfs dfs -createSnapshot / testSnapshotcreateSnapshot: Failed to add snapshot: there are already 0 snapshot(s) and the snapshot quota is 0[root@KEL hadoop-2.7.2]# hdfs dfsadmin -allowSnapshot /Allowing snaphot on / succeeded[root@KEL hadoop-2.7.2]# hdfs dfs -createSnapshot / testSnapshotCreated snapshot /.snapshot/testSnapshot

    

    启动namenode的过程信息,其中可以看到分为几个阶段,第一阶段,加载fsimage信息到内存中,花费21秒,第二加载edits log信息,花费2秒。

    第三阶段保存检查点,并没有保存,第四阶段安全模式,此时主要是等到所有的datanode汇报block块信息,可以看到这个时间也是漫长的,如果里面文件数量,块数量很多,这个启动时间可能会比较长。

    页面上的信息大部分都是dfsadmin的report信息:

[root@KEL bin]# hdfs dfsadmin -reportConfigured Capacity: 66672975872 (62.09 GB)Present Capacity: 51799486464 (48.24 GB)DFS Remaining: 50167644160 (46.72 GB)DFS Used: 1631842304 (1.52 GB)DFS Used%: 3.15%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0-------------------------------------------------Live datanodes (3):Name: 192.168.1.10:50010 (KEL)Hostname: KELDecommission Status : NormalConfigured Capacity: 29180092416 (27.18 GB)DFS Used: 543965184 (518.77 MB)

    namenode无法启动的时候,出现如下报错,记得检查zk服务是否已经启动。

2021-03-06 01:26:29,525 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode KEL1/192.168.1.99:90002021-03-06 01:26:29,531 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NNorg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1774) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5824) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1121) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)

    4 namenode的切换

    在高可用架构中,zkfc负责切换的动作,也就是杀掉自己的namenode或者是将状态修改为standby,除了namenode进程出现问题的时候,会进行切换,当zkfc进程挂了的时候,这个时候也会将namenode进行切换:

[root@KEL1 logs]# hdfs haadmin -getServiceState nn2active[root@KEL1 logs]# hdfs haadmin -getServiceState nn1standby[root@KEL1 logs]# jps2503 NameNode3626 DFSZKFailoverController#模拟zkfc进程挂掉[root@KEL1 logs]# kill -9 3626[root@KEL1 logs]# hdfs haadmin -getServiceState nn2standby[root@KEL1 logs]# hdfs haadmin -getServiceState nn1active[root@KEL1 logs]# jps2503 NameNode#本来为standby节点的zkfc进程通知active的namenode进行状态切换[root@KEL logs]# tail -f hadoop-root-zkfc-KEL.log 2021-03-06 06:25:32,521 INFO org.apache.hadoop.ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...2021-03-06 06:25:32,534 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old node exists: 0a026e7312036e6e321a044b454c3120a84628d33e2021-03-06 06:25:32,536 INFO org.apache.hadoop.ha.ZKFailoverController: Should fence: NameNode at KEL1/192.168.1.99:90002021-03-06 06:25:32,811 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at KEL1/192.168.1.99:9000 to standby state without fencing2021-03-06 06:25:32,811 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing znode /hadoop-ha/ns/ActiveBreadCrumb to indicate that the local node is the most recent active...2021-03-06 06:25:32,835 INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at KEL/192.168.1.10:9000 active...2021-03-06 06:25:33,529 INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at KEL/192.168.1.10:9000 to active state

    当journal node不可用时,standby节点的namenode会直接关闭(连接QJM超时):

2021-03-06 06:38:14,426 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.10:8485, 192.168.1.99:8485, 192.168.1.199:8485], stream=QuorumOutputStream starting at txid 54187))java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:647) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1266) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5832) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1121) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)2021-03-06 06:38:14,427 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 541872021-03-06 06:38:14,433 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 12021-03-06 06:38:14,435 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: KEL1/192.168.1.99:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS)2021-03-06 06:38:14,436 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: KEL/192.168.1.10:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS)2021-03-06 06:38:14,455 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at KEL/192.168.1.10************************************************************/

    active节点会坚持一段时间,然后也关闭namenode:

2021-03-06 06:40:32,914 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 118229 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [192.168.1.199:8485]2021-03-06 06:40:33,918 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 119233 ms (timeout=120000 ms) for a response for getJournalState(). Succeeded so far: [192.168.1.199:8485]2021-03-06 06:40:34,687 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.10:8485, 192.168.1.99:8485, 192.168.1.199:8485], stream=null))java.io.IOException: Timed out waiting 120000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createNewUniqueEpoch(QuorumJournalManager.java:182) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.recoverUnfinalizedSegments(QuorumJournalManager.java:436) at org.apache.hadoop.hdfs.server.namenode.JournalSet$8.apply(JournalSet.java:624) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393) at org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:621) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1439) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1112) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1710) at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49) at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1583) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1478) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)2021-03-06 06:40:34,691 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 12021-03-06 06:40:34,696 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: KEL1/192.168.1.99:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS)2021-03-06 06:40:34,700 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

    zkfc会报告namenodes失连:

2021-03-06 06:47:25,839 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: KEL/192.168.1.10:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=10000 MILLISECONDS)2021-03-06 06:47:25,840 WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at KEL/192.168.1.10:9000: java.net.ConnectException: Connection refused Call From KEL/192.168.1.10 to KEL:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

zk未启动的报错日志:

#其中之一的namenode2021-03-06 07:01:36,729 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 192.168.1.99:48213 Call#21 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby2021-03-06 07:02:36,558 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode KEL1/192.168.1.99:90002021-03-06 07:02:36,567 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NNorg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1774) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5824) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1121) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) at org.apache.hadoop.ipc.Client.call(Client.java:1475) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)2021-03-06 07:02:36,806 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 192.168.1.99:48217 Call#25 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby2021-03-06 06:56:27,083 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19055 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.2021-03-06 06:56:28,045 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.168.1.10:8485, 192.168.1.99:8485, 192.168.1.199:8485]. Skipping.java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond. at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471) at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1508) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1532) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:811) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:795) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)2021-03-06 07:00:02,301 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:00:36,663 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 192.168.1.10:37953 Call#17 Retry#0: org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is not supported in state standby2021-03-06 07:00:36,898 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode KEL/192.168.1.10:9000

    此时再启动zk,发现两个namenode的状态均为standby,检查日志:

2021-03-06 07:14:15,925 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:17,290 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:18,318 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:19,361 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:20,792 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:21,855 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Get corrupt file blocks returned error: Operation category READ is not supported in state standby2021-03-06 07:14:37,845 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode KEL/192.168.1.10:90002021-03-06 07:14:47,854 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: KEL/192.168.1.10:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS)2021-03-06 07:14:52,935 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state2021-03-06 07:14:52,936 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a roll of the active NNjava.io.IOException: Failed on local exception: java.io.InterruptedIOException: Interrupted: action=RetryAction(action=RETRY, delayMillis=10000, reason=null), retry policy=RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS); Host Details : local host is: "KEL1/192.168.1.99"; destination host is: "KEL":9000;  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773) at org.apache.hadoop.ipc.Client.call(Client.java:1479) at org.apache.hadoop.ipc.Client.call(Client.java:1412) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy15.rollEditLog(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:273) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:315) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)Caused by: java.io.InterruptedIOException: Interrupted: action=RetryAction(action=RETRY, delayMillis=10000, reason=null), retry policy=RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=10000 MILLISECONDS) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:868) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:633) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) at org.apache.hadoop.ipc.Client.call(Client.java:1451) ... 11 moreCaused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.ipc.Client$Connection.handleConnectionFailure(Client.java:866)  ... 16 more

    发现没有zkfc进程,启动zkfc进程,standby变成active,没有zkfc,不能自动完成切换。