If you are working on a project with multiple authors (or writing an academic paper with others), using the most appropriate and clean way of data visualization to satisfy everyone’s need (and personal taste) might be challenging. Due to differences in background and experience, the way the topic of data visualization is approached, varies between the individuals. This caused some challenges with regards to non-reproducibility and accordingly, time-related issues during my PhD studies, which have motivated me to explore user-friendly (?) Python libraries for data visualization. Today we will check out Seaborn, which is an alternative frontend for matplotlib, the de-facto standard for creating statistical graphics in Python.
In my day to day workflow, I like to explore my data together with the code I am writing in the context Jupyter notebooks. Jupyter notebooks provide a great interactive programming environment, while at the same time allowing for reproducibility (if used correctly).
For installing required packages, I recommend using Pipenv:
$ pipenv install pandas, seaborn, pingouin, jupyter
As usual, we import the necessary libraries and packages in the first cell of our notebook:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import pingouin as pg
As a first step, we should take care of the data format. If the data was collected and subsequently organized by different people, it might look like chaotic. In some articles and blog posts it is mentioned, that the biggest part of data analysis is spent cleaning and preparing the data for the further processing. For better data organization, tidy data format, in which each variable has its own column and each observation has its own row, is suggested. A nice side effect of the tidy data format is the fact, that it is well supported by most statistical libraries.
In our example, we will use the tidy data format as seen below (table is shortend):
df = pd.read_excel('dummy_data.xlsx') df
In our example, as experiment subjects, pigeons received 3 conditions for 6 days and we recorded certain behaviours occurring in the experiment sessions.
The next step is deciding about the best ways to represent the data.
If we are interested in seeing daily behavioural differences in conditions, we can use Seaborn’s
barplot with defining the day as the
Seaborn provides quite a lot of built-in colour palettes, for which you can select the most appropriate ones for characteristics of your data and your visualization goals.
I usually use the
colorblind palette because of my target group (e.g. Kevin 🥸).
Another parameter that can be configured is the way error bars are calculated and displayed.
If you use the default options, confidence intervals will be shown on the graphs.
In our example, we draw it using standart deviation and therefore we define the parameter as
I would highly recommend you to use the
order parameter to decide the order of the values displayed on the x-axis.
If you are not happy with the axis labels implicitly taken from the data, you can also set them explicitly.
One last thing, if you have huge error bars, the default legend location inside the graph might be problematic, therefore you can override the location using
barplot = sns.barplot(x="CONDITION", y="behaviors", hue='DAY',palette="colorblind",data=df,ci='sd', order = ['condition_1','condition_2','condition_3']) barplot.set(xlabel='conditions',ylabel='number of behaviors') plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
For a publication, we should decide on the graph type which communicates main information most effectively.
In our case, we see higher number of behaviours in
condition_3 compared to
To be sure about this tendency, we can first perform a two-way repeated-measures ANOVA to see the condition, day and their interaction effect and a subsequent pairwise Tukey-HSD post-hoc test as multiple comparison test to check the difference between the conditions.
aov = pg.rm_anova(dv='behaviors', within=['CONDITION','DAY'], subject='pigeon_id', data=df) display(aov)
|2||CONDITION * DAY||245.292||10||70||24.5292||0.981517||0.467437||0.415535||0.122974||0.272103|
In line with our former prediction, there is a significant difference between the conditions 3 and 2.
To highlight this difference explicitly, we can use Seaborn’s
Since some publications have certain requirements for the graphs, we can alter our figure size, the font type, label sizes using the global matplotlib parameter
In our example, instead of using a built-in colour palette, I defined the colours using hexadecimal colour codes and set the colour palette with these defined colours.
It is also possible to see outlier observations as a default if you do not specify the
In our example, we did not use
hue, as we did in barplot, but keep in mind that boxplot also allows the usage of the
Similar to the barplot, we can also define labels, and limits for the y-axis.
As a last thing, we can (manually) include the statistical annotation to highlight the significant difference between the conditions 2 and 3.
plt.rcParams['figure.figsize'] = [8, 8] plt.rcParams["font.family"] = "Times New Roman" plt.rcParams['ytick.labelsize'] = 15 plt.rcParams['xtick.labelsize'] = 20 plt.rcParams['axes.labelsize'] = 20 colors = ["#807fff", "#fd7f82", "#84a97e"] sns.set_palette(sns.color_palette(colors)) boxplot = sns.boxplot(x="CONDITION", y='behaviors', data=df, order=['condition_1', 'condition_2', 'condition_3'], fliersize=0) boxplot.set_xlabel("") boxplot.set_ylabel("distribution of self-oriented behaviours") boxplot.set_xticklabels(['condition_1', 'condition_2', 'condition_3']) boxplot.set_ylim([-1, 35]) x2,x3 = 1, 2 y, h, col = df['behaviors'].max() + 2, 2, 'k' plt.rcParams["font.family"] = "serif" plt.plot([x2+0.1, x2+0.1, x3, x3], [y, y+h, y+h, y], lw=1.5, c=col) plt.text((x2+x3)*.5, y+h, "*", fontsize=15, ha='center', va='bottom', color=col)
As you can see creating graphs using Python and Seaborn can be done even without having a background in software development or computer science. A lot of good content and examples can be found online and I would advise everyone getting started to simply copy-paste code, change it, and see what happens. Stay tuned for more content from my side 🐦.