First Steps #3: A Primer on Plots

This post #3 in our First Steps series. Want to learn Julia but don't know where to start? Start here!

First Steps #3: A Primer on Plots

Visualizing data is an essential skill for a data scientist.  Unlike R, Julia does not ship with plotting functionality built-in.  If you search for ways to make plots in Julia, you'll discover a lot of options.  So what should you use?

📊 Plots.jl

We recommend the Plots package (especially for beginners).

Plots is a unified interface for creating visualizations with different backends (such as GR, Plotly.js, and UnicodePlots).  It's great for beginners and power users both and it's designed such that a lot things you try will "just work".

💻 Install Plots

In the Julia REPL, add the Plots package if you haven't already done so.  Recall that you enter Pkg Mode by pressing ]:

(@v1.6) pkg> add Plots

📈 Create Your First Plot

Back in Julia mode (by pressing delete), enter:

julia> using Plots

julia> plot(randn(10), title="My First Plot")

🎉 Congrats!  You made your first plot 📈!  You created it using:

  1. randn(10): A Vector of 10 random samples from a Normal(0,1) distribution.
  2. The GR backend (Plots' default).

✨ Core Principles

The main function you'll use, as you may have guessed, is

plot(args...; kw...)

Here args... means any number of positional arguments and kw... is any number of keyword arguments.  Look back at the first plot we created and notice that we provided data randn(10) as a positional argument and the title title="My First Plot" as a keyword argument.  Another function you'll use is

plot!(args...; kw...)

In Julia, ! is used as a convention to identify functions that mutate at least one of the arguments.  With Plots, this lets you make changes or additions to a plot.


Now that we know the functions we are using, let's look at the core principles:

Principle #1: Every Thing You Plot is a Series

When you give data to the plot function (like randn(10) above), the seriestype determines how Plots will interpet the data.  By default this is :path.  

plot(1:10, seriestype = :path, label = "Series 1")

plot!(rand(1:10,10), seriestype = :scatter, label = "Series 2")

Principle #2: Plot Attributes have Aliases

Plot attributes are passed by keyword arguments.   Because of aliases, you can often guess at the name of an attribute and Plots will interpret it correctly.  For example, the following commands are equivalent:

plot(randn(10), seriestype = :scatter)

plot(randn(10), st = :scatter)

scatter(randn(10))

Principle #3: Columns are Mapped to Series

For both data and attributes, the columns of matrices will be mapped to individual series.  In this example, we create two series by providing a 10 x 2 matrix.  Now look at the difference between p1 and p2.  If the st (seriestype) attribute is a vector, the provided attributes will loop through the available series.  If the st attribute is a matrix, the attributes in the i-th column will be mapped to the i-th series.  This provides a very succinct way of providing attributes to series.

x = randn(10, 2)

# Series 1 --> :scatter & :line
# Series 2 --> :scatter & :line
p1 = plot(x, st=[:scatter, :line])  

# Series 1 --> :scatter
# Series 2 --> :line
p2 = plot(x, st=[:scatter :line]) 

plot(p1, p2)

Principle #4:  Some Attributes are Magic 🪄

Some attributes can be provided with multiple values all at once and Plots will figure out what to do with them.  For example, using m=(10, .5, "blue") will set the marker size to 10, the marker alpha (opacity) to 0.5, and the marker color to "blue".

plot(randn(10), m = (10, .5, "blue"))
Plot Created with Magic

Principle #5: Many Types have Plot Recipes

This is best seen through example.  Let's add the RDatasets and OnlineStats packages via Pkg Mode in the REPL:

(@v1.6) pkg> add OnlineStats RDatasets

Now load the packages and retrieve the diamonds dataset that comes packaged with R's ggplot2.  The diamonds data is collection of variables on diamond price and quality.

using RDatasets, OnlineStats

df = dataset("ggplot2", "diamonds")

Suppose the first thing we want to see is the distribution of the :Cut variable in our diamonds data.  We'll use OnlineStats.CountMap to count the number of occurrences for each unique value in the :Cut column.  

When we plot the CountMap, a recipe is invoked to turn it into data that Plots knows how to display.  What recipes provide, other than say a plot_countmap function, is the ability to hook into plot attributes just as if you were plotting raw numbers.

o = CountMap(String)

fit!(o, string.(df.Cut))

plot(o, title="Neat!")

Try This!

Use a Different Backend

The backends of Plots can be changed interactively.  Try typing

plotly()

to switch to the interactive javascript library Plotly.js.  Then rerun the above examples.

That's It!

Now you know Plots' core principles.  Time to try a few things on your own!

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.

Resources