Setting up the development environment

This book will aim to give real code examples for you to run and experiment with. This means that it is important to be able to easily run any examples we have provided here and not to fight with the code. We will do our best to have the code tested and properly packaged, but you should also make sure that you have everything needed for the examples.

Installing Scala

Of course, you will need the Scala programming language. It evolves quickly, and the newest version could be found at http://www.scala-lang.org/download/. There are a few tips about how to install the language in your operating system at http://www.scala-lang.org/download/install.html.

Tip

Tips about installing Scala

You can always download multiple versions of Scala and experiment with them. I use Linux and my tips will be applicable to Mac OS users, too. Windows users can also do a similar setup. Here are the steps:

Install Scala under /opt/scala-{version}/ or any other path you prefer. Then, create a symlink using the following command: sudo ln -s /opt/scala-{version} scala-current. Finally, add the path to the Scala bin folder to your .bashrc (or equivalent) file using the following lines: export SCALA_HOME=/opt/scala-current and export PATH=$PATH:$SCALA_HOME/bin. This allows us to quickly change versions of Scala by just redefining the symlink.

Another way to experiment with any Scala version is to install SBT (you can find more information on this). Then, simply run sbt in your console, type ++ 2.11.7 (or any version you want), and then issue the console command. Now you can test Scala features easily.

Using SBT or Maven or any other build tool will automatically download Scala for you. If you don't need to experiment with the console, you can skip the preceding steps.

Using the preceding tips, we can use the Scala interpreter by just typing scala in the terminal or follow the sbt installation process and experiment with different language features in the REPL.

Scala IDEs

There are multiple IDEs out there that support development in Scala. There is absolutely no preference about which one to use to work with the code. Some of the most popular ones are as follows:

  • IntelliJ
  • Eclipse
  • NetBeans

They contain plugins to work with Scala, and downloading and using them should be straightforward.

Dependency management

Running most of the examples in this book will not require any additional dependencies in terms of special libraries. In some cases, though, we might need to show how a Scala code is unit tested, which will require us to use a testing framework. Also, later we might present some real-life use cases in which an additional library is used. Dealing with dependencies nowadays is done using specialized tools. They usually are interchangeable, and which one to use is a personal choice. The most popular tool used with Scala projects is SBT, but Maven is also an option, and there are many others out there as well.

Modern IDEs provide the functionality to generate the required build configuration files, but we will give some generic examples that could be useful not only here, but in future projects. Depending on the IDE you prefer, you might need to install some extra plugins to have things up and running, and a quick Google search should help.

SBT

SBT stands for Simple Build Tool and uses the Scala syntax to define how a project is built, managing dependencies, and so on. It uses .sbt files for this purpose. It also supports a setup based on Scala code in .scala files, as well as a mix of both.

To download SBT, go to http://www.scala-sbt.org/download.html.

The following screenshot shows the structure of a skeleton SBT project:

It is important to show the contents of the main .sbt files.

The version.sbt file looks as follows:

version in ThisBuild := "1.0.0-SNAPSHOT"

It contains the current version that is automatically incremented if a release is made.

The assembly.sbt file has the following contents:

assemblyMergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => {
    case PathList("javax", "servlet", xs @ _*)         => MergeStrategy.first
    case PathList(ps @ _*) if ps.last endsWith ".html" => MergeStrategy.first
    case "application.conf" => MergeStrategy.concat
    case "unwanted.txt"     => MergeStrategy.discard
    case x => old(x)
  }
}

assemblyJarName in assembly := { s"${name.value}_${scalaVersion.value}-${version.value}-assembly.jar" }

artifact in (Compile, assembly) ~= {art =>

  art.copy(`classifier` = Some("assembly"))

}

addArtifact(artifact in (Compile, assembly), assembly)

It contains information about how to build the assembly JAR—a merge strategy, final JAR name, and so on. It uses a plugin called sbtassembly (https://github.com/sbt/sbt-assembly).

The build.sbt file is the file that contains the dependencies of the project, some extra information about the compiler, and metadata. The skeleton file looks as follows:

organization := "com.ivan.nikolov"

name := "skeleton-sbt"

scalaVersion := "2.11.7"

scalacOptions := Seq("-unchecked", "-deprecation", "-encoding", "utf8")

javaOptions ++= Seq("-target", "1.8", "-source", "1.8")

publishMavenStyle := true

libraryDependencies ++= {
  val sparkVersion = "1.2.2"
  Seq(
    "org.apache.spark" %% "spark-core" % sparkVersion % "provided",
    "com.datastax.spark" %% "spark-cassandra-connector" % "1.2.1",
    "org.scalatest" %% "scalatest" % "2.1.3" % "test",
    "org.mockito" % "mockito-all" % "1.9.5" % "test" // mockito for tests
  )
}

As you can see, here we define the Java version against which we compile some manifest information and the library dependencies.

The dependencies for our project are defined in the libraryDependencies section of our SBT file. They have the following format:

"groupId" %[%] "artifactId" % "version" [% "scope"]

If we decide to separate groupId and artifactId with %% instead of %, SBT will automatically use scalaVersion and append _2.11 (for Scala 2.11.*) to artifactId. This syntax is usually used when we include dependencies written in Scala, as the convention there requires us to have the Scala version added as part of artifactId. We can, of course, manually append the Scala version to artifactId and use %.

Note

The shown dependencies will not be needed at any point in this book (the one for Spark and the Datastax one). They are here just for illustration purposes, and you can safely remove them if not needed.

SBT requires each statement to be on a new line and to be separated with a blank line from the previous one if we work with .sbt files. When using .scala files, we just write code in Scala.

The %% syntax in the dependencies is a syntactic sugar, which using scalaVersion, will replace the name of the library, for example spark-core will become spark-core_2.11 in our case.

SBT allows the engineer to express the same things differently. One example are the preceding dependencies—instead of adding a sequence of dependencies, we can add them one by one. The final result will be the same. There is also a lot of flexibility with other parts of SBT. For more information on SBT, refer to the documentation.

The project/build.properties defines the sbt version to be used when building and interacting with the application under sbt. It is as simple as the following:

sbt.version = 0.13.6

Finally, there is the project/plugins.sbt file that defines different plugins used to get things up and running. We already mentioned (sbtassembly):

logLevel := Level.Warn

addSbtPlugin("com.github.gseitz" % "sbt-release" % "1.0.0")

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

There are different plugins online that provide useful functionalities. Here are some common sbt commands that can be run from the root folder in the terminal of this skeleton project:

Tip

Useful SBT commands

  • sbt: This opens the sbt console for the current project. All of the commands that will follow can be issued from here by omitting the sbt keyword.
  • sbt test: This runs the application unit tests.
  • sbt compile: This compiles the application.
  • sbt assembly: This creates an assembly of the application (a fat JAR) that can be used to run as any other Java JAR.
Maven

Maven holds its configuration in files named pom.xml. It supports multimodule projects easily, while for sbt, there needs to be some extra work done. In Maven, each module simply has its own child pom.xml file.

To download Maven, go to https://maven.apache.org/download.cgi.

The next screenshot shows the structure of a skeleton Maven project:

The main pom.xml file is much longer than the preceding sbt solution. Let's have a look at its parts separately.

There is usually some metadata about the project and different properties that can be used in the POM files in the beginning:

<modelVersion>4.0.0</modelVersion>
<groupId>com.ivan.nikolov</groupId>
<artifactId>skeleton-mvn</artifactId>
<version>1.0.0-SNAPSHOT</version>
<properties>
    <scala.version>2.11.7</scala.version>
    <scalatest.version>2.2.4</scalatest.version>
    <spark.version>1.2.2</spark.version>
</properties>

Then, there are the dependencies:

<dependencies>                                                 
    <dependency>                                               
        <groupId>org.apache.spark</groupId>                    
        <artifactId>spark-core_2.11</artifactId>               
        <version>${spark.version}</version>                    
        <scope>provided</scope>                                
    </dependency>                                              
    <dependency>                                               
        <groupId>com.datastax.spark</groupId>                  
        <artifactId>spark-cassandra-connector_2.11</artifactId>
        <version>1.2.1</version>                               
    </dependency>                                              
    <dependency>                                               
        <groupId>org.scala-lang</groupId>                      
        <artifactId>scala-library</artifactId>                 
        <version>${scala.version}</version>                    
    </dependency>                                              
    <dependency>                                               
        <groupId>org.scalatest</groupId>                       
        <artifactId>scalatest_2.11</artifactId>                
        <version>${scalatest.version}</version>                
        <scope>test</scope>                                    
    </dependency>                                              
    <dependency>                                               
        <groupId>org.mockito</groupId>                         
        <artifactId>mockito-all</artifactId>                   
        <version>1.9.5</version>                               
        <scope>test</scope>                                    
    </dependency>                                              
</dependencies>                                                

Finally, there are the build definitions. Here, we can use various plugins to do different things with our project and give hints to the compiler. The build definitions are enclosed in the <build> tags.

First, we specify some resources:

<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<resources>
    <resource>
        <directory>${basedir}/src/main/resources</directory>
    </resource>
</resources>

The first plugin we have used is scala-maven-plugin, which is used when working with Scala and Maven:

<plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
        <execution>
            <goals>
                <goal>compile</goal>
                <goal>testCompile</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <scalaVersion>${scala.version}</scalaVersion>
    </configuration>
</plugin>

Then, we use scalatest-maven-plugin to enable unit testing with Scala and Maven:

<plugin>
    <groupId>org.scalatest</groupId>
    <artifactId>scalatest-maven-plugin</artifactId>
    <version>1.0</version>
    <configuration>
        <reportsDirectory>${project.build.directory}/surefire-reports</reportsDirectory>
        <junitxml>.</junitxml>
        <filereports>WDF TestSuite.txt</filereports>
    </configuration>
    <executions>
        <execution>
            <id>test</id>
            <goals>
                <goal>test</goal>
            </goals>

        </execution>
    </executions>
</plugin>

Finally, we have the maven-assembly-plugin that is used for building the fat JAR of the application:

<plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <version>2.5.5</version>
    <configuration>
        <appendAssemblyId>false</appendAssemblyId>
        <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
    </configuration>
    <executions>
        <execution>
            <id>make-assembly</id>
            <phase>package</phase>
            <goals>
                <goal>single</goal>
            </goals>
        </execution>
    </executions>
</plugin>

The complete pom.xml file is equivalent to the preceding sbt files that we presented.

As before, the Spark and Datastax dependencies are here just for illustration purposes.

Tip

Useful Maven commands

  • mvn clean test: This runs the application unit tests
  • mvn clean compile: This compiles the application
  • mvn clean package: This creates an assembly of the application (a fat JAR) that can be used to run as any other Java JAR
SBT versus Maven

In this book, we will be using Maven for dependency management and creating our projects. It is interchangeable with SBT, and our source code will not depend on which build system we choose. You can easily translate the .pom files to .sbt files using the skeleton that we've provided. The only difference will really be the dependencies and how they are expressed.