HDFS 断电恢复文件块案例
木木人大数据
主要分享大数据框架,如flink,spark,kafka,hbase,hive原理源码,也会分享数据仓库,图计算等。
Official Account
1.现象:
断电 导致HDFS服务不正常或者显示块损坏
2.检查HDFS系统文件健康
hdfs fsck /
3.检查hdfs fsck -list-corruptfileblocks
显示损坏的文件对应的块
Connecting to namenode via http://hadoop36:50070/fsck?ugi=hdfs&listcorruptfileblocks=1&path=%2F
The list of corrupt files under path '/' are:
blk_1075229920 /hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd
blk_1075229921 /hbase/data/JYDW/WMS_PO_ITEMS/c96cb6bfef12795181c966a8fc4ef91d/0/cf44ae0411824708bf6a894554e19780
The filesystem under path '/' has 2 CORRUPT files
4.分析:
MySQL--》大数据平台
只需要从MySQL这个表的数据重新刷新一份到HDFS平台
5.想要知道文件的哪些块分布在哪些机器上面
手工删除linux文件/dfs/dn/…..
-files 文件分块信息,
-blocks 在带-files参数后才显示block信息
-locations 在带-blocks参数后才显示block块所在datanode的具体IP位置,
-racks 在带-files参数后显示机架位置
无法显示,无法删除块文件:
hdfs fsck /hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd -files -locations -blocks -racks
Connecting to namenode via http://hadoop36:50070/fsck?ugi=hdfs&locations=1&blocks=1&files=1&path=%2Fhbase%2Fdata%2FJYDW%2FWMS_PO_ITEMS%2Fc71f5f49535e0728ca72fd1ad0166597%2F0%2Ff4d3d97bb3f64820b24cd9b4a1af5cdd
FSCK started by hdfs (auth:SIMPLE) from /192.168.1.100 for path /hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd at Sat Jan 20 15:46:55 CST 2018
/hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd 2934 bytes, 1 block(s):
/hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd: CORRUPT blockpool BP-1437036909-192.168.1.100-1509097205664 block blk_1075229920
MISSING 1 blocks of total size 2934 B
0. BP-1437036909-192.168.1.100-1509097205664:blk_1075229920_1492007 len=2934 MISSING!
Status: CORRUPT
Total size: 2934 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 2934 B)
********************************
UNDER MIN REPL'D BLOCKS: 1 (100.0 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 1
MISSING BLOCKS: 1
MISSING SIZE: 2934 B
CORRUPT BLOCKS: 1
********************************
Minimally replicated blocks: 0 (0.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 1
Missing replicas: 0
Number of data-nodes: 12
Number of racks: 1
FSCK ended at Sat Jan 20 15:46:55 CST 2018 in 0 milliseconds
The filesystem under path '/hbase/data/JYDW/WMS_PO_ITEMS/c71f5f49535e0728ca72fd1ad0166597/0/f4d3d97bb3f64820b24cd9b4a1af5cdd' is CORRUPT
好的文件是显示块分布情况的:
hadoop36:hdfs:/var/lib/hadoop-hdfs:>hdfs fsck /hbase/data/JYDW/WMS_TO/011dea9ae46dae6c1f1f3a24a75af100/0/1d60f56773984e4cac614a8b5f7e93a6 -files -locations -blocks -racks
Connecting to namenode via http://hadoop36:50070/fsck?ugi=hdfs&files=1&locations=1&blocks=1&racks=1&path=%2Fhbase%2Fdata%2FJYDW%2FWMS_TO%2F011dea9ae46dae6c1f1f3a24a75af100%2F0%2F1d60f56773984e4cac614a8b5f7e93a6
FSCK started by hdfs (auth:SIMPLE) from /192.168.1.100 for path /hbase/data/JYDW/WMS_TO/011dea9ae46dae6c1f1f3a24a75af100/0/1d60f56773984e4cac614a8b5f7e93a6 at Sat Jan 20 15:58:25 CST 2018
/hbase/data/JYDW/WMS_TO/011dea9ae46dae6c1f1f3a24a75af100/0/1d60f56773984e4cac614a8b5f7e93a6 1697 bytes, 1 block(s): OK
0. BP-1437036909-192.168.1.100-1509097205664:blk_1075227504_1489591 len=1697 Live_repl=3 [/default/192.168.1.150:50010, /default/192.168.1.153:50010, /default/192.168.1.145:50010]
blk_1075227504_1489591 len=1697 Live_repl=3
[/default/192.168.1.150:50010, /default/192.168.1.153:50010, /default/192.168.1.145:50010]
6.最终选择一了百了,删除损坏的块文件,然后业务系统数据重刷
hadoop36:hdfs:/var/lib/hadoop-hdfs:>hdfs fsck / -delete
7.假设数据仅有HDFS上
hdfs dfs -ls /xxxx
hdfs dfs -get /xxxx ./
hdfs dfs -rm /xxx
hdfs dfs -put xxx /
log文件丢一丢丢 没有关系
文件是业务数据 订单数据 丢了,需要报告
总结:
hdfs fsck / -delete 直接删除损坏的文件,由于是hbase 无需删除这个表的所有文件,只需重刷所有数据,put 有的就update 没有的就insert
另一种解决方法:hdfs debug
hdfs debug recoverLease -path /blockrecover/ruozedata.md -retries 10-files -locations -blocks -racks 好文件显示 坏文件不显示