Java大数据开发(三)Hadoop(22)-NLineInputFormat案例
NLineInputFormat使用案例
1.需求
对每个单词进行个数统计,要求根据每个输入文件的行数来规定输出多少个切片。此案例要求每三行放入一个切片中。
(1) 输入数据
hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld
(2) 期望输出数据
Number of splits:4
2.需求分析
3.代码编写
(1) 编写Mapper类
public class NLineMapper extends Mapper<LongWritable, Text, Text, LongWritable>{private Text k = new Text();private LongWritable v = new LongWritable(1);protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {// 1 获取一行String line = value.toString();// 2 切割String[] splited = line.split(" ");// 3 循环写出for (int i = 0; i < splited.length; i++) {k.set(splited[i]);context.write(k, v);}}}
(2) 编写Reducer类
public class NLineReducer extends Reducer<Text, LongWritable, Text, LongWritable>{LongWritable v = new LongWritable();protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {long sum = 0l;// 1 汇总for (LongWritable value : values) {sum += value.get();}v.set(sum);// 2 输出context.write(key, v);}}
(3) 编写Driver类
public class NLineDriver {public static void main(String[] args) throws IOException, URISyntaxException, ClassNotFoundException, InterruptedException {// 输入输出路径需要根据自己电脑上实际的输入输出路径设置args = new String[] { "d:/input/inputword", "d:/output1" };// 1 获取job对象Configuration configuration = new Configuration();Job job = Job.getInstance(configuration);// 7设置每个切片InputSplit中划分三条记录NLineInputFormat.setNumLinesPerSplit(job, 3);// 8使用NLineInputFormat处理记录数job.setInputFormatClass(NLineInputFormat.class);// 2设置jar包位置,关联mapper和reducerjob.setJarByClass(NLineDriver.class);job.setMapperClass(NLineMapper.class);job.setReducerClass(NLineReducer.class);// 3设置map输出kv类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(LongWritable.class);// 4设置最终输出kv类型job.setOutputKeyClass(Text.class);job.setOutputValueClass(LongWritable.class);// 5设置输入输出数据路径FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));// 6提交jobjob.waitForCompletion(true);}}
4.测试
(1) 输入数据
hadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldhadoop ni haoxiaoming hive helloworldxiaoming hive helloworld
(2) 输出结果的切片数
Number of splits:4
关注「跟我一起学大数据」
跟我一起学大数据
