Updated violin plot example as per suggestions in issue #7251#7360
Updated violin plot example as per suggestions in issue #7251#7360phobson merged 9 commits intomatplotlib:masterfrom
Conversation
| lav = vals[0] | ||
| if lav > q1: | ||
| lav = q1 | ||
| lav = np.clip(lav, q1, vals[0]) |
There was a problem hiding this comment.
I think you might have these two backwards.
There was a problem hiding this comment.
Could you please comment a bit more on what you mean by backwards?
There was a problem hiding this comment.
q1 is the maximum and vals[0] is the minimum.
There was a problem hiding this comment.
I am sorry, I do have them backwards. Changing that now
| # medians | ||
| med = [np.percentile(sarr, 50) for sarr in dat] | ||
| # inter-quartile ranges | ||
| iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat] |
There was a problem hiding this comment.
I think you can use the axis parameter and multiple q to compute all three of these at the same time without the list comprehension.
There was a problem hiding this comment.
To compute all of those in one call (without list comprehensions), I could do something like:
for sarr in dat:
median, q1, q3 = np.percentile(sarr, [50, 25, 75])
med.append(median)
iqr.append([q1, q3])
avs.append(adjacent_values(sarr))Is that fine?
There was a problem hiding this comment.
I think you can pass the entire array and use the axis parameter to get it to work on the right dimension. avs would probably need to stay as the list comprehension.
There was a problem hiding this comment.
Got it.
I could replace
med = [np.percentile(sarr, 50) for sarr in dat]
with
med = np.percentile(dat, 50, axis=1)
Since iqr is a list of tuples (or rather, list of lists of size 2), I am unable to see how we could do all 3 percentiles with a single call. I could still replace
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
with
iqr = zip(*(np.percentile(dat, [25, 75], axis=1)))
if that is fine.
There was a problem hiding this comment.
No need for the zip; just use .T to swap the axes. You could do it in one call with a temporary variable, but I'm not sure it's any better looking:
tmp = np.percentile(dat, [25, 50, 75], axis=1))).T
med = tmp[:, 1]
iqr = tmp[:, [0, 2]]|
|
||
| q1, q3 = np.percentile(vals, [25, 75]) | ||
| # inter-quartile range iqr | ||
| iqr = q3 - q1 |
There was a problem hiding this comment.
What about removing this comment and instead changing to clearer variable names:
interquartile_range = q3 - q1
| # inter-quartile range iqr | ||
| iqr = q3 - q1 | ||
| # upper adjacent values | ||
| uav = q3 + iqr * 1.5 |
There was a problem hiding this comment.
same, change to
upper_adajcent_values = q3 - interquartile_range*1.5
| ax2.plot([i + 1, i + 1], avs[i], '-', color='black', linewidth=1) | ||
| # quartiles | ||
| ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5) | ||
| # medians |
There was a problem hiding this comment.
doesn't need this comment 'cause clear variable name. Also, still in favor of vlines for this (like in #6814) because I think that's a clearer indication of what's being plotted.
There was a problem hiding this comment.
@story645 👍 for better variable names, I have made changes and pushed a commit. I also intend to include the change to vlines in a separate commit, if we go ahead with that.
| # inter-quartile ranges | ||
| iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat] | ||
| # upper and lower adjacent values | ||
| avs = [adjacent_values(sarr) for sarr in dat] |
There was a problem hiding this comment.
since this is the whiskers, why not change the variable name to whiskers? and again, then removing the comment
|
|
||
| ax1.set_ylabel('Observed values') | ||
| medians = np.percentile(data, 50, axis=1) | ||
| inter_quartile_ranges = list(zip(*(np.percentile(data, [25, 75], axis=1)))) |
There was a problem hiding this comment.
Don't need the list(zip; use .T as noted before.
There was a problem hiding this comment.
Agreed 👍 , have made the change as suggested in latest commit
| pc.set_alpha(1) | ||
|
|
||
| ax1.set_ylabel('Observed values') | ||
| tmp = (np.percentile(data, [25, 50, 75], axis=1)).T |
There was a problem hiding this comment.
Do you actually need a transpose here or would untransposed be the same and then: median = quantiles[1], inter quartile range = qauntiles[[0,2]]
There was a problem hiding this comment.
Without using the transpose would work for the medians. For the inter quartile ranges we do need a transpose.
There was a problem hiding this comment.
That makes no sense. Either the rows are turned into columns or they aren't. I could be doing the selection wrong on the rows for inter quartile range though. What's the shape of quantiles before the transpose?
There was a problem hiding this comment.
The list inter_quartile_ranges is supposed to be a list of pairs.
Before the transpose, the shape of quantiles would be 3 x (some N). Let's say these three rows are numbered 0, 1, 2. Row 1 would be the medians, no problem there. If we do
inter_quartile_ranges = tmp[[0, 2]]
(without the initial transpose), it's shape would be 2 x (some N). We need it's shape to be (some N) x (2) for it to be a list of pairs.
Please correct me if I'm wrong.
There was a problem hiding this comment.
Hmm, then maybe save the transpose to inter_quartile_ranges = tmp[[0,2]].T? And I think tmp should be named qaurtiles since that's what they are. That being said, if you go the vlines approach, it simplifies to quartile1, median, quartile3 = np.percentile(data, [25, 50, 75], axis=1) where qaurtile1 is iqrmin, etc.
Also, currently iqr is getting computed both here and in adjacent values and that should probably be cleaned up. My first pass suggestion is change the function sig of adjacent_values to adjacent_vals(vals, q1, q3), drop the first two lines in adjacent_vals, and the call is
whiskers = [adjacent_values(sorted_array, q1, qr ) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
There was a problem hiding this comment.
Right, makes sense. Firstly, I have changed the variable name to 'quartiles' and altered the transpose calculation, as suggested. I'm working on the vlines approach and the adjacent_values() function.
There was a problem hiding this comment.
@story645 I could do something like this, as per your code in the original issue and your suggestions above:
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T
whiskers = [
adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
whiskersMin, whiskersMax = list(zip(*whiskers))
# plot medians as points,
# whiskers as thin lines, quartiles as fat lines
inds = np.arange(1, len(medians) + 1)
ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)I think this looks good. Any comments?
There was a problem hiding this comment.
I committed the changes for now. If you have any suggestions, please put them in a review, thanks!
|
Considering you are working on an example, can you please add a docstring with a title + a description while you are at it? """
===================
Saving an animation
===================
This example showcases the same animations as `basic_example.py`, but instead
of displaying the animation to the user, it writes to files using a
MovieWriter instance.
"""``` |
|
Hi @NelleV , there's already a docstring in the example, written by the author. Is that fine? |
|
@DrNightmare yes, it is perfect… |
|
|
||
| ax1.set_ylabel('Observed values') | ||
| quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1) | ||
| inter_quartile_ranges = np.vstack([quartile1, quartile3]).T |
There was a problem hiding this comment.
Pretty sure this variable is no longer used and so can be removed
| adjacent_values(sorted_array, q1, q3) | ||
| for sorted_array, q1, q3 in zip(data, quartile1, quartile3)] | ||
| whiskersMin, whiskersMax = list(zip(*whiskers)) | ||
| # plot medians as points, |
There was a problem hiding this comment.
Can remove these comments, probably
There was a problem hiding this comment.
Hmm right, will remove these comments and the unused variable
|
|
||
| lower_adjacent_value = q1 - (q3 - q1) * 1.5 | ||
| lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1) | ||
| return [lower_adjacent_value, upper_adjacent_value] |
There was a problem hiding this comment.
Probabky doesn't need to be bracketed [], wonder if it can be used to remove zip* stuff in the unpacking...
There was a problem hiding this comment.
To avoid the unpacking, I could also optionally make 'whiskers' an np.array and then do:
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]
Irrespective of whether I do this or not, the brackets can be removed
There was a problem hiding this comment.
Then I think it makes sense to do that.
There was a problem hiding this comment.
Yes, updated in latest commit
|
I'm happy and okay with merging this |
| ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5) | ||
| ax2.plot(i + 1, med[i], 'o', color='white', | ||
| markersize=6, markeredgecolor='none') | ||
| # customized violin |
There was a problem hiding this comment.
Hmm, right. Guess the title we set for the axes are self explanatory.
Updated with comments removed @story645
| data, showmeans=False, showmedians=False, | ||
| showextrema=False) | ||
|
|
||
| # customize colors |
There was a problem hiding this comment.
probably also unnecessary comment
| ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1) | ||
|
|
||
| # set style for the axes | ||
| labels = ['A', 'B', 'C', 'D'] # labels |
|
@DrNightmare Thanks! These improvements are much appreciated. |
Updated violin plot example as per suggestions in issue #7251
|
Backported to |
|
@DrNightmare Thanks, hopefully we will hear from you again! |
|
@tacaswell Sure :) |
|
🍾 |
|
@DrNightmare the cool thing about about contributing to the docs is that the benefits are (nearly) immediately available: |
As per issue #7251 , there were certain changes required in the customized violin plot example.
Here is a list of changes made:
Minor changes:
4. Moved the styling of axes to a function
5. Rearranged overall code and added comments to improve readability, made some code more pythonic, renamed certain variables