First Steps #3: A Primer on Plots
This post #3 in our First Steps series. Want to learn Julia but don't know where to start? Start here!
Visualizing data is an essential skill for a data scientist. Unlike R, Julia does not ship with plotting functionality built-in. If you search for ways to make plots in Julia, you'll discover a lot of options. So what should you use?
We recommend the Plots package (especially for beginners).
Plots is a unified interface for creating visualizations with different backends (such as GR, Plotly.js, and UnicodePlots). It's great for beginners and power users both and it's designed such that a lot things you try will "just work".
💻 Install Plots
In the Julia REPL, add the Plots package if you haven't already done so. Recall that you enter Pkg Mode by pressing
(@v1.6) pkg> add Plots
📈 Create Your First Plot
Back in Julia mode (by pressing delete), enter:
julia> using Plots julia> plot(randn(10), title="My First Plot")
🎉 Congrats! You made your first plot 📈! You created it using:
Vectorof 10 random samples from a Normal(0,1) distribution.
- The GR backend (Plots' default).
✨ Core Principles
The main function you'll use, as you may have guessed, is
args... means any number of positional arguments and
kw... is any number of keyword arguments. Look back at the first plot we created and notice that we provided data
randn(10) as a positional argument and the title
title="My First Plot" as a keyword argument. Another function you'll use is
! is used as a convention to identify functions that mutate at least one of the arguments. With Plots, this lets you make changes or additions to a plot.
Now that we know the functions we are using, let's look at the core principles:
Principle #1: Every Thing You Plot is a Series
When you give data to the
plot function (like
randn(10) above), the
seriestype determines how Plots will interpet the data. By default this is
plot(1:10, seriestype = :path, label = "Series 1") plot!(rand(1:10,10), seriestype = :scatter, label = "Series 2")
Principle #2: Plot Attributes have Aliases
Plot attributes are passed by keyword arguments. Because of aliases, you can often guess at the name of an attribute and Plots will interpret it correctly. For example, the following commands are equivalent:
plot(randn(10), seriestype = :scatter) plot(randn(10), st = :scatter) scatter(randn(10))
Principle #3: Columns are Mapped to Series
For both data and attributes, the columns of matrices will be mapped to individual series. In this example, we create two series by providing a 10 x 2 matrix. Now look at the difference between
p2. If the
seriestype) attribute is a vector, the provided attributes will loop through the available series. If the
st attribute is a matrix, the attributes in the i-th column will be mapped to the i-th series. This provides a very succinct way of providing attributes to series.
x = randn(10, 2) # Series 1 --> :scatter & :line # Series 2 --> :scatter & :line p1 = plot(x, st=[:scatter, :line]) # Series 1 --> :scatter # Series 2 --> :line p2 = plot(x, st=[:scatter :line]) plot(p1, p2)
Principle #4: Some Attributes are Magic 🪄
Some attributes can be provided with multiple values all at once and Plots will figure out what to do with them. For example, using
m=(10, .5, "blue") will set the marker size to
10, the marker alpha (opacity) to
0.5, and the marker color to
plot(randn(10), m = (10, .5, "blue"))
Principle #5: Many Types have Plot Recipes
This is best seen through example. Let's add the RDatasets and OnlineStats packages via Pkg Mode in the REPL:
(@v1.6) pkg> add OnlineStats RDatasets
Now load the packages and retrieve the diamonds dataset that comes packaged with R's ggplot2. The diamonds data is collection of variables on diamond price and quality.
using RDatasets, OnlineStats df = dataset("ggplot2", "diamonds")
Suppose the first thing we want to see is the distribution of the
:Cut variable in our diamonds data. We'll use
OnlineStats.CountMap to count the number of occurrences for each unique value in the
When we plot the
CountMap, a recipe is invoked to turn it into data that Plots knows how to display. What recipes provide, other than say a
plot_countmap function, is the ability to hook into plot attributes just as if you were plotting raw numbers.
o = CountMap(String) fit!(o, string.(df.Cut)) plot(o, title="Neat!")
Use a Different Backend
The backends of Plots can be changed interactively. Try typing
Now you know Plots' core principles. Time to try a few things on your own!
Enjoying Julia For Data Science? Please share us with a friend and follow us on Twitter at @JuliaForDataSci.