Aug 3, 2022

The Titanic Machine Learning Problem in Elixir

The Problem

The “Titanic Problem” is the most popular contest on Kaggle, and it presents an easily digestible problem to allow newcomers to dip their toes into machine learning.

Based on a training data set that contains characteristics about a passenger such as their age, sex, and class of their ticket, we are tasked with predicting which passengers survived the shipwreck. There are thousands of submissions with some truly impressive feature engineering, but in order to really participate in the competition, you need to show your work in the form of a Jupyter notebook. In other words, you’re going to be writing Python or R.

Why Deviate?

Python is fine in my eyes. It’s approachable, readable, and the machine learning community has fully embraced and supported the Python ecosystem. The sheer breadth and depth of developer mindshare and third party libraries had me questioning if there is room for any viable alternative to exist.

Elixir has been my professional bread-and-butter for the past five years: I built (and sold) a company using it, and even if there is a relative dearth of third-party libraries, the developer experience has always been something I’ve admired. The Elixir community feels inherently optimistic. Functional programming has never been more approachable: I’m not a purist by any means, but writing good Elixir code feels like something you’d want to show off to your teammates (I frequently do to limited acclaim, I concede). I don’t have a better argument than that.

I knew that the nx project had been brewing for awhile, and finally, my interest in machine learning caught up to the point where I can use it. There’s also the excellent axon library that is comparable to PyTorch and explorer for exploring your dataset. While these libraries may not be as mature, their APIs are similar to their Python equivalents; there has been a genuine effort to build a fully-featured ML suite in Elixir.

Early Struggles

I’ll preface this project by saying that it took me a bit too long. I already had a decent grasp of neural network concepts, and there is plenty of example code out there in Python that would have enabled me to cobble something together and iterate more quickly.

Python has had a twenty-year head start to seed the internet with blog posts and tutorials, so make no mistake, you’re going to be reading a lot of documentation when doing a ML project in Elixir. My general feeling is that the documentation is of extremely high quality, but nothing makes a concept click like a working, usable example.

In order to really get the ball rolling, I’d need to setup an environment that allowed me to make some immediate progress and test my hypotheses in real time. Livebook has been a fantastic replacement for a Jupyter notebook, and it was a breeze to setup locally.

Back to the Problem

I ran Livebook locally using the recommended instruction, but I suppose using Docker would be just as easy:

MIX_ENV=prod mix phx.server

Once Livebook was up and running, I installed nx, axon, explorer, and vega_lite for visualization. I also recognized that for this problem, I’d need to be doing one-hot encoding and I didn’t want to write that part myself. Thankfully, there was a benevolent dev that walked this path before me and wrote analysis_prep. I forked it and made a minor tweak to get it to compile with Elixir 1.13, and I was ready to begin.

To create a prediction, I’m going to build a simple neural network in Axon and do some basic feature engineering to massage the data into a state that’s usable by our model. My intent here is to prove that it’s viable to do serious machine learning in the Elixir ecosystem, not to solve this particular problem in the most sophisticated manner.

The Kaggle competition provides us a CSV with data we can use to train our model, so the first step is to load it into a DataFrame to be able to work with it more easily:

Livebook

I wrote this blog post in Livebook, and I’ve published the .livemd on Github. In each section below, I’ll also include all of my Livebook code entries, along with the results:

Mix.install([
  {:axon, "~> 0.1.0"},
  {:exla, "~> 0.2.2"},
  {:nx, "~> 0.2.1"},
  {:explorer, "~> 0.2.0"},
  {:analysis_prep, github: "ryancurtin/analysis_prep"},
  {:vega_lite, "~> 0.1.4"},
  {:kino_vega_lite, "~> 0.1.2"},
  {:jason, "~> 1.2"}
])

alias VegaLite, as: Vl

warning: String.strip/1 is deprecated. Use String.trim/1 instead
  mix.exs:7: AnalysisPrep.Mixfile.project/0

VegaLite

df = Explorer.DataFrame.from_csv!("data/train.csv")

#Explorer.DataFrame<
  Polars[891 x 12]
  PassengerId integer [1, 2, 3, 4, 5, ...]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Name string ["Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
   "Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)",
   "Allen, Mr. William Henry", ...]
  Sex string ["male", "female", "female", "female", "male", ...]
  Age float [22.0, 38.0, 26.0, 35.0, 35.0, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
  Ticket string ["A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "373450", ...]
  Fare float [7.25, 71.2833, 7.925, 53.1, 8.05, ...]
  Cabin string [nil, "C85", nil, "C123", nil, ...]
  Embarked string ["S", "C", "S", "S", "S", ...]
>

Cleaning the Data

Some fields will not be relevant to the overall result of the dataset, so they can either be dropped or modified in order to give us clean input data for training / testing.

While we use Pclass as a proxy for the economic status of the passenger, we are dropping “Fare” since that would serve a similar purpose. I acknowledge that the other columns like “Ticket”, “Cabin”, and “Embarked” could have some impact on whether or not a passenger survived, but I chose to omit them because I considered their predictive value to be negligible.

df_clean =
  Explorer.DataFrame.select(
    df,
    ["Name", "Cabin", "Embarked", "PassengerId", "Ticket", "Fare"],
    :drop
  )

#Explorer.DataFrame<
  Polars[891 x 6]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Sex string ["male", "female", "female", "female", "male", ...]
  Age float [22.0, 38.0, 26.0, 35.0, 35.0, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
>

Elixir lacks a built-in normalization algorithm for DataFrames, so I am writing my own in which I ignore nil values.

defmodule Normalizer do
  def normalize(list) do
    filtered = Enum.filter(list, & &1)
    {min, max} = Enum.min_max(filtered)
    {new_min, new_max} = {0, 1}

    Enum.map(list, fn entry ->
      if entry do
        new_min + (entry - min) / (max - min) * (new_max - new_min)
      else
        nil
      end
    end)
  end
end

{:module, Normalizer, <<70, 79, 82, 49, 0, 0, 8, ...>>, {:normalize, 1}}

Normalizing Age Column

The Age column will skew the predictions if it is too large of a scalar value. We’ll need to normalize this data so that it is scaled between 0 and 1, which makes it less succeptible to an exploding or vanishing gradient due to the impact of weights.

age_series =
  Explorer.DataFrame.to_series(df)
  |> Map.get("Age")
  |> Explorer.Series.to_enum()

normalized_ages = Normalizer.normalize(age_series)

normalized_age_series = Explorer.Series.from_list(normalized_ages)
df_clean = Explorer.DataFrame.mutate(df_clean, Age: normalized_age_series)

#Explorer.DataFrame<
  Polars[891 x 6]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Sex string ["male", "female", "female", "female", "male", ...]
  Age float [0.2711736617240513, 0.4722292033174164, 0.32143754712239253, 0.43453128926866047,
   0.43453128926866047, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
>

One-hot encoding Sex Column

Since a neural network is operating on a series of matrices with weights as floats, processing “male” vs. “female” becomes much easier with a simple binary encoding of 1 for male and 0 for female. Also note that I’m dividing by 1 to convert the number to a float in the DataFrame.

[_classification_list | sex_one_hot] =
  Explorer.DataFrame.to_series(df)
  |> Map.get("Sex")
  |> Explorer.Series.to_enum()
  |> AnalysisPrep.one_hot()

sex_series =
  sex_one_hot
  |> Enum.map(&Enum.at(&1, 1))
  |> Enum.map(&(&1 / 1))
  |> Explorer.Series.from_list()

df_clean = Explorer.DataFrame.mutate(df_clean, Sex: sex_series)

#Explorer.DataFrame<
  Polars[891 x 6]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Sex float [1.0, 0.0, 0.0, 0.0, 1.0, ...]
  Age float [0.2711736617240513, 0.4722292033174164, 0.32143754712239253, 0.43453128926866047,
   0.43453128926866047, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
>

Binary columns for each Pclass (One-hot)

Pclass can be 1, 2, or 3, indicating that the passenger’s class. I won’t include any analysis of this value, but first-class passengers are more likely to survive. Similar to one-hot encoding, we want to make sure we’re operating on binary values to prevent the weights from being skewed. The resulting DataFrame will have a column for Pclass1, Pclass2, and Pclass3, with a binary value (again as a float) representing the passenger’s ticketed class.

[_classification_list | pclass_one_hot] =
  Explorer.DataFrame.to_series(df)
  |> Map.get("Pclass")
  |> Explorer.Series.to_enum()
  |> AnalysisPrep.one_hot()

# Want to directly turn this list of lists into a Series and merge into the DataFrame
reduced_one_hot =
  Enum.reduce(pclass_one_hot, %{"Pclass1" => [], "Pclass2" => [], "Pclass3" => []}, fn row, acc ->
    Map.update!(acc, "Pclass1", &(&1 ++ [Enum.at(row, 0) / 1]))
    |> Map.update!("Pclass2", &(&1 ++ [Enum.at(row, 1) / 1]))
    |> Map.update!("Pclass3", &(&1 ++ [Enum.at(row, 2) / 1]))
  end)

df_clean = Explorer.DataFrame.mutate(df_clean, reduced_one_hot)

#Explorer.DataFrame<
  Polars[891 x 9]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Sex float [1.0, 0.0, 0.0, 0.0, 1.0, ...]
  Age float [0.2711736617240513, 0.4722292033174164, 0.32143754712239253, 0.43453128926866047,
   0.43453128926866047, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
  Pclass1 float [0.0, 1.0, 0.0, 1.0, 0.0, ...]
  Pclass2 float [0.0, 0.0, 0.0, 0.0, 0.0, ...]
  Pclass3 float [1.0, 0.0, 1.0, 0.0, 1.0, ...]
>

Dealing with nil values

Nx tensors cannot support nil values, and in many cases, neither do the algorithms we employ in our neural network. One slightly acceptable approach (at least for our purposes here) is to insert the mean or median value into the age column: Source.

A more sophisticated approach would likely do something like bucketing passengers without age values based on the characteristics of similar passengers with age values and “guessing” their age. In order to let me finish this post in a reasonable amount of time, you’ll need to give me a break.

Next, we will map over the data frame and insert the median value in any rows missing “Age”:

age_series = Explorer.DataFrame.to_series(df_clean) |> Map.get("Age")
age_series_median = age_series |> Explorer.Series.median()

age_series_updated =
  Explorer.Series.to_enum(age_series)
  |> Enum.map(&(&1 || age_series_median))
  |> Explorer.Series.from_list()

df_clean = Explorer.DataFrame.mutate(df_clean, %{Age: age_series_updated})

#Explorer.DataFrame<
  Polars[891 x 9]
  Survived integer [0, 1, 1, 1, 0, ...]
  Pclass integer [3, 1, 3, 1, 3, ...]
  Sex float [1.0, 0.0, 0.0, 0.0, 1.0, ...]
  Age float [0.2711736617240513, 0.4722292033174164, 0.32143754712239253, 0.43453128926866047,
   0.43453128926866047, ...]
  SibSp integer [1, 1, 0, 1, 0, ...]
  Parch integer [0, 0, 0, 0, 0, ...]
  Pclass1 float [0.0, 1.0, 0.0, 1.0, 0.0, ...]
  Pclass2 float [0.0, 0.0, 0.0, 0.0, 0.0, ...]
  Pclass3 float [1.0, 0.0, 1.0, 0.0, 1.0, ...]
>

Modularizing the Data Cleanup

Now that we’ve processed the data cleanup steps one-by-one, it’s time to create a module to handle that and run it on our DataFrame before preparing the data for the model. I also wanted to be able to put together some more idiomatic Elixir for anyone still following at home:

defmodule DataCleanup do
  alias Explorer.DataFrame, as: DF
  alias Explorer.Series

  def build_relevant_dataframe(file_path) do
    csv_to_dataframe(file_path)
    |> normalize_age()
    |> one_hot_encode_sex()
    |> one_hot_encode_pclass()
    |> replace_nil_age_values()
    |> drop_irrelevant_columns()
  end

  defp csv_to_dataframe(file_path) do
    DF.from_csv!(file_path)
  end

  defp normalize_age(df) do
    age_series =
      DF.to_series(df)
      |> Map.get("Age")
      |> Explorer.Series.to_enum()

    normalized_age_series =
      Normalizer.normalize(age_series)
      |> Series.from_list()

    DF.mutate(df, Age: normalized_age_series)
  end

  defp one_hot_encode_sex(df) do
    [_classification_list | sex_one_hot] =
      DF.to_series(df)
      |> Map.get("Sex")
      |> Series.to_enum()
      |> AnalysisPrep.one_hot()

    sex_series =
      sex_one_hot
      |> Enum.map(&Enum.at(&1, 1))
      |> Enum.map(&(&1 / 1))
      |> Series.from_list()

    DF.mutate(df, Sex: sex_series)
  end

  defp one_hot_encode_pclass(df) do
    [_classification_list | pclass_one_hot] =
      DF.to_series(df)
      |> Map.get("Pclass")
      |> Series.to_enum()
      |> AnalysisPrep.one_hot()

    # Want to directly turn this list of lists into a Series and merge into the DataFrame
    reduced_one_hot =
      Enum.reduce(pclass_one_hot, %{"Pclass1" => [], "Pclass2" => [], "Pclass3" => []}, fn row,
                                                                                           acc ->
        Map.update!(acc, "Pclass1", &(&1 ++ [Enum.at(row, 0) / 1]))
        |> Map.update!("Pclass2", &(&1 ++ [Enum.at(row, 1) / 1]))
        |> Map.update!("Pclass3", &(&1 ++ [Enum.at(row, 2) / 1]))
      end)

    DF.mutate(df, reduced_one_hot)
  end

  defp replace_nil_age_values(df) do
    age_series = DF.to_series(df) |> Map.get("Age")
    age_series_median = age_series |> Series.median()

    age_series_updated =
      Series.to_enum(age_series)
      |> Enum.map(&(&1 || age_series_median))
      |> Series.from_list()

    DF.mutate(df, %{Age: age_series_updated})
  end

  defp drop_irrelevant_columns(df) do
    DF.select(df, ["Name", "Cabin", "Embarked", "PassengerId", "Ticket", "Pclass", "Fare"], :drop)
  end
end

{:module, DataCleanup, <<70, 79, 82, 49, 0, 0, 19, ...>>, {:drop_irrelevant_columns, 1}}

Splitting Training Data by Classification

We will need to split the training data into an “X” and “Y” DataFrame in order to start training our model. The “Y” DataFrame will contain the classification for each passenger (i.e. “Did they survive?”) and the “X” DataFrame contains the data from which we will derive the optimum weights for our predictive model.

One other thing to note is that there is no to_tensor method in explorer to convert a DataFrame to a tensor. There has been discussion on this issue, but for now, the API does not have that feature available. I borrowed some code from a recent Dockyard post that shows how you can use nx and axon to detect a fradulent credit card transaction. I’d recommend checking that one out as soon as you’re done here!

In any case, let’s split the data and convert it to a tensor so we can start training our model:

to_tensor = fn df ->
  df
  |> Explorer.DataFrame.names()
  |> Enum.map(&(Explorer.Series.to_tensor(df[&1]) |> Nx.new_axis(-1)))
  |> Nx.concatenate(axis: 1)
end

df_train_x =
  DataCleanup.build_relevant_dataframe("data/train.csv")
  |> Explorer.DataFrame.select(&(&1 == "Survived"), :drop)

df_train_y =
  DataCleanup.build_relevant_dataframe("data/train.csv")
  |> Explorer.DataFrame.select(&(&1 == "Survived"), :keep)

df_test_x = DataCleanup.build_relevant_dataframe("data/test.csv")
x_test = to_tensor.(df_test_x)

x_train = to_tensor.(df_train_x)
y_train = to_tensor.(df_train_y)

#Nx.Tensor<
  s64[891][1]
  [
    [0],
    [1],
    [1],
    [1],
    [0],
    [0],
    [0],
    [0],
    [1],
    [1],
    [1],
    [1],
    [0],
    [0],
    [0],
    [1],
    [0],
    [1],
    [0],
    [1],
    [0],
    [1],
    [1],
    [1],
    [0],
    [1],
    [0],
    [0],
    [1],
    [0],
    [0],
    [1],
    [1],
    [0],
    [0],
    [0],
    [1],
    [0],
    [0],
    [1],
    [0],
    [0],
    [0],
    [1],
    [1],
    [0],
    [0],
    [1],
    [0],
    [0],
    ...
  ]
>

Using a Neural Network with Axon

Axon is a really nice library that allows us to build a reasonably complex neural network in a few lines of code. We’re going to train our model in batches in order to be more efficient with our calculations. If you’re running this in Livebook, you can see how much the batch size has an effect on the time it takes to train the model and its performance (even on our small network and dataset).

I have also chosen to use the popular “adam” optimizer function and a learning rate of 0.01, but these can be tweaked to eke out an extra bit of performance in the model.

batched_train_inputs = Nx.to_batched_list(x_train, 8)
batched_train_targets = Nx.to_batched_list(y_train, 8)
batched_train = Stream.zip(batched_train_inputs, batched_train_targets)

model =
  Axon.input({nil, 7}, "input")
  |> Axon.dense(32)
  |> Axon.relu()
  |> Axon.dense(8)
  |> Axon.dense(1, activation: :sigmoid)

loss = :mean_squared_error
learning_rate = 0.01
optimizer = Axon.Optimizers.adam(learning_rate)

model

---------------------------------------------------------------------------------------------------
                                               Model
===================================================================================================
 Layer                              Shape       Policy              Parameters   Parameters Memory
===================================================================================================
 input ( input )                    {nil, 7}    p=f32 c=f32 o=f32   0            0 bytes
 dense_0 ( dense["input"] )         {nil, 32}   p=f32 c=f32 o=f32   256          1024 bytes
 relu_0 ( relu["dense_0"] )         {nil, 32}   p=f32 c=f32 o=f32   0            0 bytes
 dense_1 ( dense["relu_0"] )        {nil, 8}    p=f32 c=f32 o=f32   264          1056 bytes
 dense_2 ( dense["dense_1"] )       {nil, 1}    p=f32 c=f32 o=f32   9            36 bytes
 sigmoid_0 ( sigmoid["dense_2"] )   {nil, 1}    p=f32 c=f32 o=f32   0            0 bytes
---------------------------------------------------------------------------------------------------
Total Parameters: 529
Total Parameters Memory: 2116 bytes
Inputs: %{"input" => {nil, 7}}

Training the Model

Now that we have defined the model’s hidden layers, the loss function, and the optimizer, we can start feeding our model some training data to hone its ability to make predictions.

Visualizing the results of the model are also going to be important, if only for the motivational effect of watching the result of the cost function inching closer toward zero on a chart. I chose to use a loop handler function mostly to showcase the functionaliy, but in any case, it will store the loss value after each iteration (epoch). While axon will return only the model’s trained weights by default, we can use an output transformation to include this metadata with the results of training our model.

Once we have our handler and output transformation ready to go, we can finally see this thing in action. If we can put aside the sneaking suspicion that we wrote all this code only to be as accurate as flipping a coin, let’s train our model:

epochs = 100

# Using output_transform to store the accumulated metadata, which will contain the loss value
# at each epoch.
trainer = model |> Axon.Loop.trainer(loss, optimizer)

output_transform = fn state ->
  %{model_state: state.step_state[:model_state], metadata: state.handler_metadata}
end

# Axon.Loop.Trainer is just a struct, need to update the pre-filled output_transform function
trainer = Map.put(trainer, :output_transform, output_transform)

# At the end of each epoch, store the loss value in handler_metadata
plot_loss = fn state ->
  loss_val = get_in(state.metrics, ["loss"]) |> Nx.to_flat_list() |> hd()
  loss_values = Map.get(state.handler_metadata, "loss_values", [])
  handler_metadata = Map.put(state.handler_metadata, "loss_values", loss_values ++ [loss_val])
  state = Map.put(state, :handler_metadata, handler_metadata)

  {:continue, state}
end

# output_transform dictates the shape of `Axon.Loop.Run`
%{model_state: trained_model, metadata: %{"loss_values" => loss_values}} =
  trainer
  |> Axon.Loop.handle(:epoch_completed, plot_loss)
  |> Axon.Loop.run(Stream.zip(batched_train_inputs, batched_train_targets), %{}, epochs: epochs)

Epoch: 0, Batch: 100, loss: 0.1640735
Epoch: 1, Batch: 100, loss: 0.1527200
Epoch: 2, Batch: 100, loss: 0.1477675
Epoch: 3, Batch: 100, loss: 0.1447048
Epoch: 4, Batch: 100, loss: 0.1425270
Epoch: 5, Batch: 100, loss: 0.1411263
Epoch: 6, Batch: 100, loss: 0.1398882
Epoch: 7, Batch: 100, loss: 0.1388006
Epoch: 8, Batch: 100, loss: 0.1379194
Epoch: 9, Batch: 100, loss: 0.1371443
Epoch: 10, Batch: 100, loss: 0.1364465
...
Epoch: 98, Batch: 100, loss: 0.1238978
Epoch: 99, Batch: 100, loss: 0.1238549

%{
  metadata: %{
    "loss_values" => [0.16193918883800507, 0.1519005447626114, 0.14717158675193787,
     0.14418663084506989, 0.14209724962711334, 0.1407088190317154, 0.1395207792520523,
     0.13844850659370422, 0.13761302828788757, 0.13685482740402222, 0.13617441058158875,
     0.13553673028945923, 0.13502216339111328, 0.13455359637737274, 0.1341123729944229,
     0.13369393348693848, 0.13333043456077576, 0.13296577334403992, 0.13263621926307678,
     0.13233985006809235, 0.13204225897789001, 0.13178651034832, 0.1315300315618515,
     0.1313217729330063, 0.13107798993587494, 0.13085655868053436, 0.13064654171466827,
     0.13046181201934814, 0.13026021420955658, 0.13005253672599792, 0.129860982298851,
     0.12965837121009827, 0.12945054471492767, 0.12925803661346436, 0.1290828287601471,
     0.12892134487628937, 0.12875521183013916, 0.12858687341213226, 0.12842632830142975,
     0.12827393412590027, 0.1281227171421051, 0.12797710299491882, 0.1278315782546997,
     0.12770028412342072, 0.1275707632303238, 0.12743964791297913, 0.1273147463798523,
     0.12719707190990448, ...]
  },
  model_state: %{
    "dense_0" => %{
      "bias" => #Nx.Tensor<
        f32[32]
        [0.48431119322776794, -0.3218774199485779, -0.3365810215473175, -0.7434893250465393, -0.059355683624744415, -0.5267767310142517, -0.09751468896865845, -1.2660173177719116, -0.5662182569503784, -0.052933838218450546, 0.011728767305612564, -0.20872630178928375, -0.25485992431640625, 0.3001868724822998, -0.48852279782295227, -0.5277023911476135, -1.46438729763031, -0.11637403070926666, -0.23592086136341095, 0.5312404632568359, -0.18907999992370605, -0.8861296772956848, -1.9921395778656006, -0.3276658058166504, -0.14148756861686707, -0.08799608796834946, -0.5120258927345276, -0.5496641993522644, -0.44760748744010925, 0.27319273352622986, -0.3149799406528473, 0.24053142964839935]
      >,
      "kernel" => #Nx.Tensor<
        f32[7][32]
        [
          [-2.112074613571167, 0.245676651597023, 1.359257698059082, 0.044966921210289, -0.06395534425973892, 0.2173847109079361, -0.13602691888809204, 0.8178601264953613, -1.257620930671692, -0.4693145155906677, 0.6687984466552734, 1.6996864080429077, -0.024476123973727226, -0.3558703064918518, -0.4285537600517273, 1.2962980270385742, 0.7204050421714783, 0.07872967422008514, -0.7387552857398987, -2.3958849906921387, 0.22929106652736664, -1.447434425354004, 0.9135518670082092, -0.3228147625923157, -0.0020771073177456856, -0.09033364057540894, -0.02768341824412346, 0.2156837433576584, -0.2429112195968628, -1.9384404420852661, 1.5690561532974243, -1.6603679656982422],
          [0.23627085983753204, -0.8921284675598145, 1.2259567975997925, 2.201821804046631, 0.07504372298717499, -0.08471934497356415, -0.03623810037970543, 1.6597213745117188, 0.6584843993186951, -0.432372123003006, -7.905939102172852, -0.6525036692619324, 0.09789866209030151, ...],
          ...
        ]
      >
    },
    "dense_1" => %{
      "bias" => #Nx.Tensor<
        f32[8]
        [-0.15212774276733398, -0.03979223221540451, -0.18317702412605286, 0.07438859343528748, 0.00686146505177021, 0.003242790699005127, -0.11101000010967255, 0.08618582785129547]
      >,
      "kernel" => #Nx.Tensor<
        f32[32][8]
        [
          [-0.44955071806907654, 0.16989654302597046, -0.4962971806526184, 0.10145474225282669, -0.13934916257858276, -0.003514248877763748, -0.2227456420660019, -0.11616845428943634],
          [-0.19957245886325836, 0.019206302240490913, 0.15599700808525085, -0.18888533115386963, -0.1668655425310135, -0.2813955843448639, 0.07639379799365997, -0.1949022114276886],
          [0.420330286026001, 0.046093907207250595, 0.6452733278274536, 0.04875735938549042, -0.050045158714056015, -0.04381776601076126, -0.19390226900577545, -0.16936610639095306],
          [0.5843852162361145, 0.23029372096061707, 0.1350916475057602, 0.07630657404661179, -0.05963528901338577, -0.07237391918897629, -0.002917686477303505, -0.30435481667518616],
          [-0.11707118898630142, 0.18852463364601135, 0.13916735351085663, -0.23189371824264526, -0.3102315664291382, -0.3165343701839447, 0.29969581961631775, -1.9986118422821164e-4],
          [-0.08958841115236282, 0.08741355687379837, 0.16904757916927338, -0.17817290127277374, ...],
          ...
        ]
      >
    },
    "dense_2" => %{
      "bias" => #Nx.Tensor<
        f32[1]
        [0.03484217822551727]
      >,
      "kernel" => #Nx.Tensor<
        f32[8][1]
        [
          [-0.4545503854751587],
          [-0.0426076278090477],
          [-0.5270304679870605],
          [-0.10788994282484055],
          [0.05954229459166527],
          [0.014364827424287796],
          [0.14906303584575653],
          [0.016167059540748596]
        ]
      >
    }
  }
}

Charting the Results

The missing piece of the Elixir ML suite to this point has been the ability to visualize data within Livebook. Lucky for us, Elixir has a nice VegaLite integration and uses Kino to display the chart right in Livebook.

You could put the chart rendering in the training loop and update it on completion of each epoch, but here I’m simply iterating over the loss values from our model’s training outputs. In the end, the result is the same: Our cost value is going down over time and all is right with the world.

# Plotting the loss function over time.  Can push values in the plot_loss function to show
# it rendering in real time.

chart =
  Vl.new(width: 400, height: 400)
  |> Vl.mark(:line, tooltip: false)
  |> Vl.encode_field(:x, "Epoch", type: :quantitative)
  |> Vl.encode_field(:y, "Loss", type: :quantitative)
  |> Kino.VegaLite.new()

chart |> Kino.render()

Enum.with_index(loss_values)
|> Enum.each(fn {lv, idx} ->
  Kino.VegaLite.push(chart, %{"Epoch" => idx + 1, "Loss" => lv})
  Process.sleep(10)
end)

:ok

Who Survived?

Now that we’ve trained our model, the only thing left to do is to have it generate predictions. Axon allows us to run our predictions with the test data we previously loaded into a tensor, create a DataFrame with this information, and output the result to a CSV. While you can’t enter your results in to the Kaggle competition, you can take solace in the fact that you now know how to build a neural network in Elixir.

While there were some minor immaturities in the ecosystem that could have derailed us in the early stages, it’s clear that there is a bright future for machine learning development in Elixir. I’ll still begrudgingly write Python for most of my new projects, but I’m excited to report that viable alternatives exist!

results = Axon.predict(model, trained_model, x_test) |> Nx.round()
survived_series = Explorer.Series.from_tensor(results)

Explorer.DataFrame.from_csv!("data/test.csv")
|> Explorer.DataFrame.mutate(%{Survived: survived_series})
|> Explorer.DataFrame.select(["PassengerId", "Survived"], :keep)
|> Explorer.DataFrame.to_csv!("data/results.csv")