vlambda博客
学习文章列表

编程语言-scala:Spark环境搭建

    基于IntelliJ IDEA 搭建好Scala编程环境()之后,接下来,基于以上环境接着搭建Spark环境。

1.Spark jar包下载

    Spark环境的搭建需要下载Spark依赖包--jar包,在Spark的jar包下载界面(https://spark.apache.org/downloads.html)选择相应的版本

    下载压缩包后解压,文件夹lib中的spark-assembly-1.2.0-hadoop2.4.0.jar是环境搭建需要的。将解压后的文件夹放到软件安装的文件夹下,即可。

2.环境搭建

首先新建Scala文件,完成后打开pom.xml文件,会显示如下界面:


需要Spark依赖包,配置程序如下:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>cn.jie_h</groupId> <artifactId>simpleSpark</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <properties> <spark.version>2.2.0</spark.version> </properties>
<repositories> <repository> <id>nexus-aliyun</id> <name>Nexus aliyun</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> </repository> </repositories>
<dependencies> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> </dependencies>
<build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <version>2.3</version> <configuration> <classifier>dist</classifier> <appendAssemblyId>true</appendAssemblyId> <descriptorRefs> <descriptor>jar-with-dependencies</descriptor> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build></project>

3.程序实验

在新建的Scala.class中输入:

import org.apache.spark.SparkContextimport org.apache.spark.SparkContext._import org.apache.spark.SparkConfobject WordCount { def main(args: Array[String]) {    val inputFile =  "C:\\Users\\p's\\Desktop\\abc.txt" val conf = new SparkConf().setAppName("WordCount").setMaster("local") val sc = new SparkContext(conf) val textFile = sc.textFile(inputFile) val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCount.foreach(println) }}