Performance Tips

Here we'll cover some easy performance wins.

Performance Tips
Photo by Chris Liverani / Unsplash

The Julia Language Docs have a great page on performance tips.  It's worth a read, but the writing is aimed more towards a computer scientist than a data scientist. Here we'll cover some easy performance wins.

We find the best way to code in Julia is to:

  1. Start with code that works (without worrying about performance).
  2. Find and replace bottlenecks.  

Speeding up bottlenecks is usually straightforward in Julia, often involving drop-in replacements (like using StaticArrays for small, fixed arrays).  There are, however, some performance gotchas you should avoid from the start so that you don't need to refactor code later on.

🏗️ Do your "heavy lifting" inside functions.

Any code that is performance critical or being benchmarked should be inside a function.  - Julia Docs
  • Try to compartmentalize the parts of the problem you are trying to solve into functions.  Resist the temptation to write long scripts.  

Avoid (non-constant) globals.

"Globals" are essentially any variable that isn't inside a function.  

  • If you are using a variable as a parameter inside a function, make it an argument, e.g.
# not great 
x = 1

f(y) = x + y

# much better 
f(x, y) = x + y
  • If the variable is a constant setting, declare it as const.  However, you won't be able to change its value later on.
const x = 1

f(y) = x + y

Concern yourself with "type stability".

Type stability is the idea that the return type of a function only depends on the types of the inputs, not the input values.  Here is an example of  a type-unstable function:

function f(x)
    x < 10 ? "string" : 1

f(0) == "string"

f(100) == 1

The issue with the function above is that Julia's compiler needs to handle multiple cases (output could be either a String or an Int).

Don't change the type of a variable inside a function

Try not to initialize a variable with a type that will change later because of some computation.  For example, starting with x = 1 but then performing division x = x/2 will change x from Int to Float64.

Avoid abstract types in containers and structs.

When you use abstract types (e.g. Real) for things like arrays and struct fields, Julia must reason about a set of types rather than a specific type.  

Side note: Here is an short snippet for printing the subtypes of an abstract type using AbstractTrees:

julia> using AbstractTrees

julia> AbstractTrees.children(x::Type) = subtypes(x)

julia> print_tree(Real)
├─ AbstractFloat
│  ├─ BigFloat
│  ├─ Float16
│  ├─ Float32
│  └─ Float64
├─ AbstractIrrational
│  └─ Irrational
├─ Integer
│  ├─ Bool
│  ├─ Signed
│  │  ├─ BigInt
│  │  ├─ Int128
│  │  ├─ Int16
│  │  ├─ Int32
│  │  ├─ Int64
│  │  └─ Int8
│  └─ Unsigned
│     ├─ UInt128
│     ├─ UInt16
│     ├─ UInt32
│     ├─ UInt64
│     └─ UInt8
└─ Rational
Real type tree.
  • If you are initializing an empty array, but know the type in advance, let Julia know about it!
x = []         # eltype(x) == Any 

x = Float64[]  # eltype(x) == Float64
  • For structs, you can make them parametric to avoid ambiguous field types.
# Ambiguous field type!
struct A 

# Unambiguous field type!
struct B{T <: Real}
  • If you are unsure about whether a type is abstract or concrete, you can check with isconcretetype.  You can think about abstract types as things that "don't exist", but instead they define a set of things that do exist.

⏱️ Use @time and watch your allocations.  

One of the easiest speed gains is to remove temporary variables from a computation since Julia needs to spend time cleaning these up (garbage collection).

  • NOTE!  The first time you call @time do_something() will also include some overhead from the JIT (Just-in-time) compiler.  You'll need to run it a second time for the most accurate results.  For the most accurate measurement, see the fantastic BenchmarkTools package and its @btime macro.

Take a look at this poorly written loop to add 100 to each element of an array:

julia> function add100!(x)
           for i in 1:100
               x[:] = x .+ 1
           return x
add100 (generic function with 1 method)

julia> data = randn(10^6);

julia> @time add100!(data);
  1.115484 seconds (295.91 k allocations: 780.613 MiB, 58.96% gc time, 18.67% compilation time)

julia> @time add100!(data);
  0.386620 seconds (200 allocations: 762.947 MiB, 21.76% gc time)

There are number of red flags in the last line above:

  • The number of allocations (200).  Lots of temporary vectors are being created!
  • The memory being allocated (762.947 mebibytes).  
  • The percentage of time spent in garbage collection (21.76%).

Let's try again and see that these metrics can be improved by a lot.

julia> function add100!(x)
           x .+= 100
add100 (generic function with 1 method)

julia> data = randn(10^6);

julia> @time add100!(data);
  0.040681 seconds (140.36 k allocations: 8.008 MiB, 98.36% compilation time)

julia> @time add100!(data);
  0.001023 seconds

If you are seeing unexpectedly poor @time results, try asking yourself some of the following questions:

Can I use broadcasting?

Are you creating temporary arrays as an intermediate step in a computation?  In Julia, you can fuse multiple element-wise computations with the dot syntax:

x = randn(100)

# No intermediate vectors here!  Thanks broadcast fusion!
y = sin.(abs.(x)) .+ 1 ./ (1:100)

Is there a mutating (!) version of a function I can use?

For example, sort!(x) will sort the items of an array x in-place whereas sort(x) will return a sorted copy of x.

Working with arrays?  

Can I use views?

  • Accessing sections of an array, e.g. x[:, 1:2], creates a copy!  Try using a view of the data instead, which does not copy.  You can do this with the view function or @views macro.
x = randn(3, 3)

# The view equivalent of x[:, 1:2] (all rows, first 2 cols)
view(x, :, 1:2)  

# These are all views
@views begin 
    x[1:2, 1]
    x[:, 1:2]
Creating Views
  • Most Julia package developers write functions in terms of abstract types, e.g. f(x::AbstractVector{<:Integer}) vs. f(x::Vector{<:Integer}) (there's no performance penalty because of Julia's JIT).  This means views can be used as easy drop-in replacements for array copies!

Am I accessing elements in order?

Julia's arrays are stored in column-major order.  If you are, for example, iterating through the elements of a Matrix, make sure the inner loop is iterating over rows, since column elements are stored next to each other in your computer's memory:

x = rand(3,3)

for j in 1:3      # column j
    for i in 1:3  # row i
        x_ij = x[i, j]

🚀 That's It!

You now know how to achieve some easy performance wins in Julia.  The Julia community is very concerned (some might say obsessed) with performance, so there is a long rabbit hole to go down if this has piqued your interest.  If you want to know more details and even more performance tips, see the link below to the Julia manual's Performance Tips section.

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.