Presto对接华为HDFS 3.X,Unrecognized Hadoop major version number解决方案
一、场景复现
登录命令行:
./presto-cli --server 192.168.6.1:10086 --catalog hive --schema default --debug
创建样例表:
CREATE TABLE bigdata (
id varchar,
age int,
school varchar
)
WITH (format = 'ORC');
插入样例数据:
INSERT INTO bigdata VALUES ('100014',32,'dh'),('100015',30,'cy'),('100016',35,'hy');
查询结果:
presto:default> select * from bigdata;
id | age | school
----+-----+--------
(0 rows)
Query 20210203_081628_00013_cmcnm, FINISHED, 2 nodes
http://192.168.63.10:10086/ui/query.html?20210203_081628_00013_cmcnm
Splits: 17 total, 17 done (100.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 10% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.0
Peak Memory: 0B
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
查看presto coordinator节点后台日志文件——server.log,找到如下异常:
java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2547)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2512)
at com.facebook.presto.hive.HiveUtil.getInputFormatClass(HiveUtil.java:314)
at com.facebook.presto.hive.HiveUtil.getInputFormat(HiveUtil.java:291)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadPartition(BackgroundHiveSplitLoader.java:372)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:331)
at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:101)
at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:236)
at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
at com.facebook.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1
at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.<clinit>(OrcInputFormat.java:116)
... 17 more
二、排查过程
1、查看这类来自于哪个jar
[rhino@192-168-63-12 presto-server-0.230]$ grep -R "org.apache.hadoop.hive.shims.ShimLoader" *
Binary file plugin/hive-hadoop2/hive-apache-1.2.2-2.jar matches
Binary file plugin/raptor/hive-apache-1.2.2-2.jar matches
# 在presto启动变量上添加-verbose:class,再次确定class的来源.
[Loaded org.apache.hadoop.hive.shims.ShimLoader from file:/tmp/presto_runtime/data/plugin/hive-hadoop2/hive-apache-1.2.2-2.jar]
2、反编译查看ShimLoader类
public static String getMajorVersion() {
// 获取hadoop版本号,在复现环境中,这边即3.1.1;
final String vers = VersionInfo.getVersion();
final String[] parts = vers.split("\\.");
if (parts.length < 2) {
throw new RuntimeException("Illegal Hadoop Version: " + vers + " (expected A.B.* format)");
}
// 从这边来看,hadoop 3.x的版本便会抛出IllegalArgumentException.
switch (Integer.parseInt(parts[0])) {
case 1: {
return ShimLoader.HADOOP20SVERSIONNAME;
}
case 2: {
return ShimLoader.HADOOP23VERSIONNAME;
}
default: {
throw new IllegalArgumentException("Unrecognized Hadoop major version number: " + vers);
}
}
}
根据链接1的文章可知:hive shim模块就是用来适配hadoop版本的,适配的接口即HadoopShims。hive 1.2.2版本里提供了两个实现类:
-
Hadoop20SShims适配hadoop 1.x版本 -
Hadoop23Shims适配hadoop 2.x版本
三、解决方案
方案一、升级hive-apache jar 版本
presto-0.230\pom.xml文件,hive-apache dependency如下:
<dependency>
<groupId>com.facebook.presto.hive</groupId>
<artifactId>hive-apache</artifactId>
<version>1.2.2-2</version>
</dependency>
修改为
<dependency>
<groupId>com.facebook.presto.hive</groupId>
<artifactId>hive-apache</artifactId>
<version>3.0.0-2</version>
</dependency>
hive-apache在好presto-orc、presto-rcfile、presto-hive-metastore、presto-parquet等module中都有使用,升级version之后,会发现有很多的java 类飘红。如果直接去修改java 类,那工作量巨大。遂暂时不采用该方案
方案二、升级hive.version
org.apache.hadoop.hive.shims.ShimLoader是hive项目中的类,能否在编译hive-apache项目时,将依赖的hive版本升级成hive 3.x,以兼容hadoop 3.x,这个理论依据出自hive官网,如下:
26 August 2019: release 3.1.2 available
This release works with Hadoop 3.x.y. You can look at the complete JIRA change log for this release.
编译仍然失败,在hive-apache 1.2.2-2项目中,使用了一些类在hive 3.1.1版本里是没有的。so,该方案行不通。
方案三、在hive 1.22源码上添加hadoop 3.x case
查看hive 3.1.1 ShimLoader 源代码,可以看到hive shims 兼容hadoop 3.x仍然是使用Hadoop23Shims。
public static String getMajorVersion() {
String vers = VersionInfo.getVersion();
String[] parts = vers.split("\\.");
if (parts.length < 2) {
throw new RuntimeException("Illegal Hadoop Version: " + vers +
" (expected A.B.* format)");
}
switch (Integer.parseInt(parts[0])) {
case 2:
case 3:
return HADOOP23VERSIONNAME;
default:
throw new IllegalArgumentException("Unrecognized Hadoop major version number: " + vers);
}
}
因此,是否可以将hive 1.2.2源码下载下来,然后也添加一个case 3的分支,最终将编译出来的ShimLoader.class 替换到hive-apache-1.2.2-2.jar里。
替换、重启,登录presto客户端,再次执行SQL结果如下:
presto:default> select * from bigdata;
id | age | school
---------+-----+--------
100014 | 32 | dh
100015 | 30 | cd
100016 | 35 | hy
(3 rows)
四、总结
1、hive shims模块,是用来做hive、hadoop版本兼容性的,hive 1.2.2仅仅做了hadoop 1.x、hadoop 2.x的兼容,并不兼容hadoop 3.x。
2、ShimLoader本来是hive项目里的代码,hive-apache项目通过maven-shade-plugin插件,将依赖的jar打进了jar包。
五、遗留
1、HadoopShims实现类提供的那些方法都有什么作用?怎么达到兼容的效果的?