Getting ready for machine learning

In this section, we will install Visual Studio, take a quick lap around F#, and install the major open source libraries that we will be using.

Setting up Visual Studio

To get going, you will need to download Visual Studio on a Microsoft Windows machine. As of this writing, the latest (free) version is Visual Studio 2015 Community. If you have a higher version already installed on your machine, you can skip this step. If you need a copy, head on over to the Visual Studio home page at https://www.visualstudio.com. Download the Visual Studio Community 2015 installer and execute it.

Now, you will get the following screen:

Select Custom installation and you will be taken to the following screen:

Make sure Visual F# has a check mark next to it. Once it is installed, you should see Visual Studio in your Windows Start menu.

Learning F#

One of the great features about F# is that you can accomplish a whole lot with very little code. It is a very terse language compared to C# and VB.NET, so picking up the syntax is a bit easier. Although this is not a comprehensive introduction, this is going to introduce you to the major language features that we will use in this book. I encourage you to check out http://www.tryfsharp.org/ or the tutorials at http://fsharpforfunandprofit.com/ if you want to get a deeper understanding of the language. With that in mind, let's create our 1st F# project:

  1. Start Visual Studio.
  2. Navigate to File | New | Project as shown in the following screenshot:
  3. When the New Project dialog box appears, navigate the tree view to Visual F# | Windows | Console Application. Have a look at the following screenshot:
  4. Give your project a name, hit OK, and the Visual Studio Template generator will create the following boilerplate:

    Although Visual Studio created a Program.fs file that creates a basic console .exe application for us, we will start learning about F# in a different way, so we are going to ignore it for now.

  5. Right-click in the Solution Explorer and navigate to Add | New Item.
  6. When the Add New Item dialog box appears, select Script File.

    The Script1.fsx file is then added to the project.

  7. Once Script1.fsx is created, open it up, and enter the following into the file:
    let x = "Hello World"
  8. Highlight that entire row of code, right-click and select Execute In Interactive (or press Alt + Enter):

    And the F# Interactive console will pop up and you will see this:

The F# Interactive is a type of REPL, which stands for Read-Evaluate-Print-Loop. If you are a .NET developer who has spent any time in SQL Server Management Studio, the F# Interactive will look very familiar to the Query Analyzer where you enter your code at the top and see how it executes at the bottom. Also, if you are a data scientist using R Studio, you are very familiar with the concept of a REPL. I have used the words REPL and FSI interchangeably in this book.

There are a couple of things to notice about this first line of F# code you wrote. First, it looks very similar to C#. In fact, consider changing the code to this:

It would be perfectly valid C#. Note that the red squiggly line, showing you that the F# compiler certainly does not think this is valid.

Going back to the correct code, notice that type of x is not explicitly defined. F# uses the concept of inferred typing so that you don't have to write the type of the values that you create. I used the term value deliberately because unlike variables, which can be assigned in C# and VB.NET, values are immutable; once bound, they can never change. Here, we are permanently binding the name x to its value, Hello World. This notion of immutability might seem constraining at first, but it has profound and positive implications, especially when writing machine learning models.

With our basic program idea proven out, let's move it over to a compliable assembly; in this case, an .exe that targets the console. Highlight the line that you just wrote, press Ctrl + C, and then open up Program.fs. Go into the code that was generated and paste it in:

[<EntryPoint>]
let main argv = 
    printfn "%A" argv
    let x = "Hello World"
    0 // return an integer exit code
Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password.
  • Hover the mouse pointer on the SUPPORT tab at the top.
  • Click on Code Downloads & Errata.
  • Enter the name of the book in the Search box.
  • Select the book for which you're looking to download the code files.
  • Choose from the drop-down menu where you purchased this book from.
  • Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux

Then, add the following lines of code around what you just added:

// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.
open System

[<EntryPoint>]
let main argv = 
    printfn "%A" argv
    let x = "Hello World"
    Console.WriteLine(x)
    let y = Console.ReadKey()
    0 // return an integer exit code

Press the Start button (or hit F5) and you should see your program run:

You will notice that I had to bind the return value from Console.ReadKey() to y. In C# or VB.NET, you can get away with not handling the return value explicitly. In F#, you are not allowed to ignore the returned values. Although some might think this is a limitation, it is actually a strength of the language. It is much harder to make a mistake in F# because the language forces you to address execution paths explicitly versus accidentally sweeping them under the rug (or into a null, but we'll get to that later).

In any event, let's go back to our script file and enter in another line of code:

let ints = [|1;2;3;4;5;6|]

If you send that line of code to the REPL, you should see this:

val ints : int [] = [|1; 2; 3; 4; 5; 6|]

This is an array, as if you did this in C#:

var ints = new[] {1,2,3,4,5,6};

Notice that the separator is a semicolon in F# and not a comma. This differs from many other languages, including C#. The comma in F# is reserved for tuples, not for separating items in an array. We'll discuss tuples later.

Now, let's sum up the values in our array:

let summedValue = ints |> Array.sum

While sending that line to the REPL, you should see this:

val summedValue : int = 21

There are two things going on. We have the |> operator, which is a pipe forward operator. If you have experience with Linux or PowerShell, this should be familiar. However, if you have a background in C#, it might look unfamiliar. The pipe forward operator takes the result of the value on the left-hand side of the operator (in this case, ints) and pushes it into the function on the right-hand side (in this case, sum).

The other new language construct is Array.sum. Array is a module in the core F# libraries, which has a series of functions that you can apply to your data. The function sum, well, sums the values in the array, as you can probably guess by inspecting the result.

So, now, let's add a different function from the Array type:

let multiplied = ints |> Array.map (fun i -> i * 2)

If you send it to the REPL, you should see this:

val multiplied : int [] = [|2; 4; 6; 8; 10; 12|]

Array.map is an example of a high ordered function that is part of the Array type. Its parameter is another function. Effectively, we are passing a function into another function. In this case, we are creating an anonymous function that takes a parameter i and returns i * 2. You know it is an anonymous function because it starts with the keyword fun and the IDE makes it easy for us to understand that by making it blue. This anonymous function is also called a lambda expression, which has been in C# and VB.NET since .Net 3.5, so you might have run across it before. If you have a data science background using R, you are already quite familiar with lambdas.

Getting back to the higher-ordered function Array.map, you can see that it applies the lambda function against each item of the array and returns a new array with the new values.

We will be using Array.map (and its more generic kin Seq.map) a lot when we start implementing machine learning models as it is the best way to transform an array of data. Also, if you have been paying attention to the buzz words of map/reduce when describing big data applications such as Hadoop, the word map means exactly the same thing in this context. One final note is that because of immutability in F#, the original array is not altered, instead, multiplied is bound to a new array.

Let's stay in the script and add in another couple more lines of code:

let multiplyByTwo x =
    x * 2

If you send it to the REPL, you should see this:

val multiplyByTwo : x:int -> int

These two lines created a named function called multiplyByTwo. The function that takes a single parameter x and then returns the value of the parameter multiplied by 2. This is exactly the same as our anonymous function we created earlier in-line that we passed into the map function. The syntax might seem a bit strange because of the -> operator. You can read this as, "the function multiplyByTwo takes in a parameter called x of type int and returns an int."

Note three things here. Parameter x is inferred to be an int because it is used in the body of the function as multiplied to another int. If the function reads x * 2.0, the x would have been inferred as a float. This is a significant departure from C# and VB.NET but pretty familiar for people who use R. Also, there is no return statement for the function, instead, the final expression of any function is always returned as the result. The last thing to note is that whitespace is important so that the indentation is required. If the code was written like this:

let multiplyByTwo(x) =
x * 2

The compiler would complain:

Script1.fsx(8,1): warning FS0058: Possible incorrect indentation: this token is offside of context started at position (7:1).

Since F# does not use curly braces and semicolons (or the end keyword), such as C# or VB.NET, it needs to use something to separate code. That separation is whitespace. Since it is good coding practice to use whitespace judiciously, this should not be very alarming to people having a C# or VB.NET background. If you have a background in R or Python, this should seem natural to you.

Since multiplyByTwo is the functional equivalent of the lambda created in Array.map (fun i -> i * 2), we can do this if we want:

let multiplied' = ints |> Array.map (fun i -> multiplyByTwo i)

If you send it to the REPL, you should see this:

val multiplied' : int [] = [|2; 4; 6; 8; 10; 12|]

Typically, we will use named functions when we need to use that function in several places in our code and we use a lambda expression when we only need that function for a specific line of code.

There is another minor thing to note. I used the tick notation for the value multiplied when I wanted to create another value that was representing the same idea. This kind of notation is used frequently in the scientific community, but can get unwieldy if you attempt to use it for a third or even fourth (multiplied'''') representation.

Next, let's add another named function to the REPL:

let isEven x =
    match x % 2 = 0 with
    | true -> "even"
    | false -> "odd"
isEven 2
isEven 3

If you send it to the REPL, you should see this:

val isEven : x:int -> string

This is a function named isEven that takes a single parameter x. The body of the function uses a pattern-matching statement to determine whether the parameter is odd or even. When it is odd, then it returns the string odd. When it is even, it returns the string even.

There is one really interesting thing going on here. The match statement is a basic example of pattern matching and it is one of the coolest features of F#. For now, you can consider the match statement much like the switch statement that you may be familiar within R, Python, C#, or VB.NET, but we will see how it becomes much more powerful in the later chapters. I would have written the conditional logic like this:

let isEven' x =
    if x % 2 = 0 then "even" else "odd"

But I prefer to use pattern matching for this kind of conditional logic. In fact, I will attempt to go through this entire book without using an if…then statement.

With isEven written, I can now chain my functions together like this:

let multipliedAndIsEven = 
    ints
    |> Array.map (fun i -> multiplyByTwo i)
    |> Array.map (fun i -> isEven i)

If you send it to REPL, you should see this:

val multipliedAndIsEven : string [] =
 [|"even"; "even"; "even"; "even"; "even"; "even"|]

In this case, the resulting array from the first pipe Array.map (fun i -> multiplyByTwo i)) gets sent to the next function Array.map (fun i -> isEven i). This means we might have three arrays floating around in memory: ints which is passed into the first pipe, the result from the first pipe that is passed into the second pipe, and the result from the second pipe. From your mental model point of view, you can think about each array being passed from one function into the next. In this book, I will be chaining pipe forwards frequently as it is such a powerful construct and it perfectly matches the thought process when we are creating and using machine learning models.

You now know enough F# to get you up and running with the first machine learning models in this book. I will be introducing other F# language features as the book goes along, but this is a good start. As you will see, F# is truly a powerful language where a simple syntax can lead to very complex work.