编程语言-scala:Spark环境搭建
基于IntelliJ IDEA 搭建好Scala编程环境()之后,接下来,基于以上环境接着搭建Spark环境。
1.Spark jar包下载
Spark环境的搭建需要下载Spark依赖包--jar包,在Spark的jar包下载界面(https://spark.apache.org/downloads.html)选择相应的版本
下载压缩包后解压,文件夹lib中的spark-assembly-1.2.0-hadoop2.4.0.jar是环境搭建需要的。将解压后的文件夹放到软件安装的文件夹下,即可。
2.环境搭建
首先新建Scala文件,完成后打开pom.xml文件,会显示如下界面:
需要Spark依赖包,配置程序如下:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"><modelVersion>4.0.0</modelVersion><groupId>cn.jie_h</groupId><artifactId>simpleSpark</artifactId><packaging>jar</packaging><version>1.0-SNAPSHOT</version><properties><spark.version>2.2.0</spark.version></properties><repositories><repository><id>nexus-aliyun</id><name>Nexus aliyun</name><url>http://maven.aliyun.com/nexus/content/groups/public</url></repository></repositories><dependencies><!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 --><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.10</artifactId><version>${spark.version}</version></dependency></dependencies><build><plugins><plugin><artifactId>maven-assembly-plugin</artifactId><version>2.3</version><configuration><classifier>dist</classifier><appendAssemblyId>true</appendAssemblyId><descriptorRefs><descriptor>jar-with-dependencies</descriptor></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build></project>
3.程序实验
在新建的Scala.class中输入:
import org.apache.spark.SparkContextimport org.apache.spark.SparkContext._import org.apache.spark.SparkConfobject WordCount {def main(args: Array[String]) {val inputFile = "C:\\Users\\p's\\Desktop\\abc.txt"val conf = new SparkConf().setAppName("WordCount").setMaster("local")val sc = new SparkContext(conf)val textFile = sc.textFile(inputFile)val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)wordCount.foreach(println)}}
