The Best Data Science Talks of JuliaCon 2021

JuliaCon 2021 has come and gone. I've finally finished getting through my backlog of talks to watch and wanted to share my favorites (that fall under the category of data science).

JuliaCon 2021
Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.

Introduction

JuliaCon 2021 has come and gone.  I've finally finished getting through my backlog of talks to watch and wanted to share my favorites (that fall under the category of data science).  Note that this is not an exhaustive list of "good talks".  I'm merely highlighting the ones most applicable to the average data scientist and I hope you check out many more than just these listed here!

Each talk below is listed (in alphabetical order) along with its abstract as well as some of my own notes.  I've also assigned a rating in terms of beginner-friendliness using the following scheme:

  • 🟢 – Beginner friendly!  Aimed at Julia beginners.  
  • 🟦 – Intermediate.  You're expected to know a bit of Julia beforehand.
  • ♦️ – Advanced.  Either advanced Julia or domain knowledge will help you understand the talk.

The Talks


♦️ Applied Measure Theory for Probablistic Modeling

  • Speaker: Chad Scherrer.
  • Abstract: We'll give an overview of MeasureTheory.jl, describing some of the advantages relative to Distributions.jl and some applications in probabilistic modeling.
  • Josh's notes:  This talk only loosely falls under the category of data science, but I love using the mathematical statistics/measure theory part of my brain, so it gets included.  I really enjoyed this one.

🟢🟦 Bias Audit and Mitigation in Julia

  • Speaker: Ashrya Agrawal.
  • Abstract: This talk introduces Fairness.jl, a toolkit to audit and mitigate bias in ML decision support tools. We shall introduce the problem of fairness in ML systems, its sources, significance and challenges. Then we will demonstrate Fairness.jl structure and workflow.
  • Josh's notes:  This talk gives a great introduction to the issues of fairness/bias in machine learning as well as offers easy-to-follow examples using Fairness.jl.

🟢🟦 Clearing the Pipeline Jungle with FeatureTransforms.jl

  • Speaker: Glenn Moynihan.
  • Abstract: The prevalence of glue code in feature engineering pipelines poses many problems in conducting high-quality, scalable research. In worst-case scenarios, the technical debt racked up by overgrown “pipeline jungles” can preclude further development and grind promising projects to a halt. This talk will show how the FeatureTransforms.jl package can help make feature engineering a more sustainable practice for users without sacrificing the flexibility they desire.
  • Josh's notes:  This talk has a solid introduction to the real-world difficulties with feature engineering and moves on to clear examples using FeatureTransforms.jl.

🟢🟦 DataFrames.jl 1.0 Tutorial (workshop)

  • Speaker: Bogumił Kamiński.
  • Abstract: In this workshop an introduction to DataFrames.jl 1.2 will be presented. You will learn how to load, transform and visualize your data using the DataFrames.jl package. The tutorial assumes that you have some experience in working with data frames in e.g. R or Python.  All the materials used are available for download at https://github.com/bkamins/JuliaCon2021-DataFrames-Tutorial.
  • Josh's notes: A great way to learn DataFrames from its most active maintainer!

♦️ Easy, Featureful Parallelism with Dagger.jl

  • Speaker: Julian P Samaroo.
  • Abstract: Parallelizing codes with Distributed.jl is simple and can provide an appreciable speed-up; but for complicated problems or when scaling to large problem sizes, the APIs are somewhat lacking. Dagger.jl takes parallelism to the next level, with support for GPU execution, fault tolerance, and more. Dagger's scheduler exploits every bit of parallelism it can find, and uses all the resources you can give it. In this talk, I'll build an application with Dagger to highlight what Dagger can do for you!
  • Josh's notes: Working on a non-trivial distributed computing problem?  This is a low level talk, but if you've struggled with Julia's distributed computing primitives, you may want to give Dagger a try.

🟦♦️ Introduction to Bayesian Data Analysis (workshop)

  • Speaker: Kusti Skytén, Chad Scherrer, & Tor Fjelde.
  • Abstract: This workshop will introduce the recommended workflow for applied Bayesian data analysis by working through an example analysis together. We will start with the simplest non-trivial model and use increasingly sophisticated models to explain the properties of our data set based on model diagnostics. We will also give an overview of the different probabilistic programming packages in Julia and show where we have advantages over other languages such as Stan and Python.
  • Josh's notes: This is an in-depth tutorial to the ecosystem of Bayesian data analysis in Julia.  You may want to start with the "Statistics with Julia from the Ground Up" talk if you are newer to Julia.

🟢🟦 Pluto – One Year Later

  • Speaker: Fons van der Plas.
  • Abstract: Pluto.jl is a notebook IDE for Julia, with a focus on interactivity and education. In this talk, you'll learn about our work during the past year, and our future plans.
  • Josh's notes: Pluto got a lot of attention the past year and rightfully so.  If you find yourself in a notebook environment (like Jupyter) often, you should give Pluto a try.  Watch this one!  If you want more of Pluto, also check out the Open and Interactive Computational Thinking with Julia and Pluto talk.

🟦 Rewriting Pieces of a Python Codebase in Julia

  • Speaker: Satvik Souza Beri.
  • Abstract: Many people looking at Julia are coming from Python, and already have a sizable codebase. Our fund started rewriting performance-critical parts of our Python codebase in Julia, getting 10x-30x speedups. I'll go over how to start migrating Python code to Julia using PyCall and PyJulia, some gotchas to avoid, and where you're likely to see the biggest benefits.
  • Josh's notes: This is not specific to data science, but I've included this talk because many people are in the same boat.

🟢 The State of DataFrames.jl

  • Speaker: Bogumił Kamiński.
  • Abstract: In this talk I discuss what has recently changed in DataFrames.jl, what is the current state of the package, and what are our plans for the future.
  • Josh's notes:  You won't see a lot of code in this talk (see the tutorial), but you'll learn about DataFrames' design philosophy as well as what the open source contributors to DataFrames.jl have achieved.

🟦 State of Julia

  • Speakers: Jeff Bezanson, Stefan Karpinski, Keno Fischer, & Viral Shah
  • Abstract: Annual talk on the state of things by Julia's creators.
  • Josh's notes:  This is a staple of every JuliaCon and always worth a watch to see what people are working on under the hood.

🟢 Statistics with Julia from the Ground Up (workshop)

  • Speaker: Yoni Nazarathy.
  • Abstract: This workshop provides an introduction to the Julia language for data-scientists and statisticians. No prior experience with Julia is assumed. The workshop starts with a few Julia basics and then progresses through basic probability and statistics examples, usage of dataframes, elementary statistical inference, regression, and more advanced methods. At the end of this workshop, attendees will have solid entry point for using Julia as their preferred data analysis tool.
  • Josh's notes: This is the best workshop I watched.  It begins by introducing Julia at a comfortable pace and then goes into a tour of the entire statistics ecosystem, including doing basic statistics, plotting, working with data, etc.

🚀 That's It!

Did we miss any awesome talks?

Enjoying Julia For Data Science?  Please share us with a friend and follow us on Twitter at @JuliaForDataSci.