Updated violin plot example as per suggestions in issue #7251 by DrNightmare · Pull Request #7360 · matplotlib/matplotlib

DrNightmare · 2016-10-29T19:10:37Z

As per issue #7251 , there were certain changes required in the customized violin plot example.

Here is a list of changes made:

Replaced the defined percentile function with a call to np.percentile
Used np.clip as suggested
Used list comprehensions, this is more pythonic and more readable

Minor changes:
4. Moved the styling of axes to a function
5. Rearranged overall code and added comments to improve readability, made some code more pythonic, renamed certain variables

QuLogic · 2016-10-29T19:53:55Z

examples/statistics/customized_violin_demo.py

-        lav = vals[0]
-    if lav > q1:
-        lav = q1
+    lav = np.clip(lav, q1, vals[0])


I think you might have these two backwards.

Could you please comment a bit more on what you mean by backwards?

q1 is the maximum and vals[0] is the minimum.

I am sorry, I do have them backwards. Changing that now

QuLogic · 2016-10-29T19:55:57Z

examples/statistics/customized_violin_demo.py

+# medians
+med = [np.percentile(sarr, 50) for sarr in dat]
+# inter-quartile ranges
+iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]


I think you can use the axis parameter and multiple q to compute all three of these at the same time without the list comprehension.

To compute all of those in one call (without list comprehensions), I could do something like:

for sarr in dat: median, q1, q3 = np.percentile(sarr, [50, 25, 75]) med.append(median) iqr.append([q1, q3]) avs.append(adjacent_values(sarr))

Is that fine?

I think you can pass the entire array and use the axis parameter to get it to work on the right dimension. avs would probably need to stay as the list comprehension.

Got it.
I could replace
med = [np.percentile(sarr, 50) for sarr in dat]
with
med = np.percentile(dat, 50, axis=1)

Since iqr is a list of tuples (or rather, list of lists of size 2), I am unable to see how we could do all 3 percentiles with a single call. I could still replace
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
with
iqr = zip(*(np.percentile(dat, [25, 75], axis=1)))
if that is fine.

No need for the zip; just use .T to swap the axes. You could do it in one call with a temporary variable, but I'm not sure it's any better looking:

tmp = np.percentile(dat, [25, 50, 75], axis=1))).T med = tmp[:, 1] iqr = tmp[:, [0, 2]]

story645 · 2016-10-30T00:44:55Z

examples/statistics/customized_violin_demo.py

-
+    q1, q3 = np.percentile(vals, [25, 75])
+    # inter-quartile range iqr
+    iqr = q3 - q1


What about removing this comment and instead changing to clearer variable names:
interquartile_range = q3 - q1

story645 · 2016-10-30T00:45:33Z

examples/statistics/customized_violin_demo.py

+    # inter-quartile range iqr
+    iqr = q3 - q1
    # upper adjacent values
    uav = q3 + iqr * 1.5


same, change to
upper_adajcent_values = q3 - interquartile_range*1.5

story645 · 2016-10-30T00:49:35Z

examples/statistics/customized_violin_demo.py

+    ax2.plot([i + 1, i + 1], avs[i], '-', color='black', linewidth=1)
+    # quartiles
+    ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5)
+    # medians


doesn't need this comment 'cause clear variable name. Also, still in favor of vlines for this (like in #6814) because I think that's a clearer indication of what's being plotted.

@story645 👍 for better variable names, I have made changes and pushed a commit. I also intend to include the change to vlines in a separate commit, if we go ahead with that.

story645 · 2016-10-30T00:50:07Z

examples/statistics/customized_violin_demo.py

+# inter-quartile ranges
+iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
+# upper and lower adjacent values
+avs = [adjacent_values(sarr) for sarr in dat]


since this is the whiskers, why not change the variable name to whiskers? and again, then removing the comment

QuLogic · 2016-10-30T07:00:52Z

examples/statistics/customized_violin_demo.py


-ax1.set_ylabel('Observed values')
+medians = np.percentile(data, 50, axis=1)
+inter_quartile_ranges = list(zip(*(np.percentile(data, [25, 75], axis=1))))


Don't need the list(zip; use .T as noted before.

Agreed 👍 , have made the change as suggested in latest commit

story645 · 2016-10-30T08:52:51Z

examples/statistics/customized_violin_demo.py

    pc.set_alpha(1)

-ax1.set_ylabel('Observed values')
+tmp = (np.percentile(data, [25, 50, 75], axis=1)).T


Do you actually need a transpose here or would untransposed be the same and then: median = quantiles[1], inter quartile range = qauntiles[[0,2]]

Without using the transpose would work for the medians. For the inter quartile ranges we do need a transpose.

That makes no sense. Either the rows are turned into columns or they aren't. I could be doing the selection wrong on the rows for inter quartile range though. What's the shape of quantiles before the transpose?

The list inter_quartile_ranges is supposed to be a list of pairs.
Before the transpose, the shape of quantiles would be 3 x (some N). Let's say these three rows are numbered 0, 1, 2. Row 1 would be the medians, no problem there. If we do
inter_quartile_ranges = tmp[[0, 2]]
(without the initial transpose), it's shape would be 2 x (some N). We need it's shape to be (some N) x (2) for it to be a list of pairs.
Please correct me if I'm wrong.

Hmm, then maybe save the transpose to inter_quartile_ranges = tmp[[0,2]].T? And I think tmp should be named qaurtiles since that's what they are. That being said, if you go the vlines approach, it simplifies to quartile1, median, quartile3 = np.percentile(data, [25, 50, 75], axis=1) where qaurtile1 is iqrmin, etc.
Also, currently iqr is getting computed both here and in adjacent values and that should probably be cleaned up. My first pass suggestion is change the function sig of adjacent_values to adjacent_vals(vals, q1, q3), drop the first two lines in adjacent_vals, and the call is
whiskers = [adjacent_values(sorted_array, q1, qr ) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]

Right, makes sense. Firstly, I have changed the variable name to 'quartiles' and altered the transpose calculation, as suggested. I'm working on the vlines approach and the adjacent_values() function.

@story645 I could do something like this, as per your code in the original issue and your suggestions above:

quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1) inter_quartile_ranges = np.vstack([quartile1, quartile3]).T whiskers = [ adjacent_values(sorted_array, q1, q3) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)] whiskersMin, whiskersMax = list(zip(*whiskers)) # plot medians as points, # whiskers as thin lines, quartiles as fat lines inds = np.arange(1, len(medians) + 1) ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3) ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5) ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)

I think this looks good. Any comments?

I committed the changes for now. If you have any suggestions, please put them in a review, thanks!

NelleV · 2016-10-30T15:57:40Z

Considering you are working on an example, can you please add a docstring with a title + a description while you are at it?
Here is an example of a docstring:

"""
 ===================
 Saving an animation
 ===================

 This example showcases the same animations as `basic_example.py`, but instead
 of displaying the animation to the user, it writes to files using a
 MovieWriter instance.
 """```

DrNightmare · 2016-10-30T16:04:18Z

Hi @NelleV , there's already a docstring in the example, written by the author. Is that fine?

NelleV · 2016-10-30T16:06:03Z

@DrNightmare yes, it is perfect…
Sorry, I should have looked more closely!

…calculation

story645 · 2016-10-30T23:35:37Z

examples/statistics/customized_violin_demo.py


-ax1.set_ylabel('Observed values')
+quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
+inter_quartile_ranges = np.vstack([quartile1, quartile3]).T


Pretty sure this variable is no longer used and so can be removed

story645 · 2016-10-30T23:37:45Z

examples/statistics/customized_violin_demo.py

+    adjacent_values(sorted_array, q1, q3)
+    for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
+whiskersMin, whiskersMax = list(zip(*whiskers))
+# plot medians as points,


Can remove these comments, probably

Hmm right, will remove these comments and the unused variable

story645 · 2016-10-30T23:44:01Z

examples/statistics/customized_violin_demo.py

+
+    lower_adjacent_value = q1 - (q3 - q1) * 1.5
+    lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1)
+    return [lower_adjacent_value, upper_adjacent_value]


Probabky doesn't need to be bracketed [], wonder if it can be used to remove zip* stuff in the unpacking...

To avoid the unpacking, I could also optionally make 'whiskers' an np.array and then do:
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]

Irrespective of whether I do this or not, the brackets can be removed

Then I think it makes sense to do that.

Yes, updated in latest commit

story645 · 2016-10-31T14:27:56Z

I'm happy and okay with merging this

story645 · 2016-10-31T17:39:41Z

examples/statistics/customized_violin_demo.py

-    ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5)
-    ax2.plot(i + 1, med[i], 'o', color='white',
-             markersize=6, markeredgecolor='none')
+# customized violin


stray redundant comment

Hmm, right. Guess the title we set for the axes are self explanatory.
Updated with comments removed @story645

story645 · 2016-10-31T17:40:15Z

examples/statistics/customized_violin_demo.py

+        data, showmeans=False, showmedians=False,
+        showextrema=False)

 # customize colors


probably also unnecessary comment

story645 · 2016-10-31T17:40:34Z

examples/statistics/customized_violin_demo.py

+ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)
+
+# set style for the axes
+labels = ['A', 'B', 'C', 'D']    # labels


variable name = comment

phobson · 2016-10-31T19:20:33Z

@DrNightmare Thanks! These improvements are much appreciated.

DrNightmare · 2016-10-31T19:27:02Z

@phobson You're welcome! I got to learn a lot, this is my first PR in Matplotlib :)
BTW I think this closes #7251

Updated violin plot example as per suggestions in issue #7251

phobson · 2016-10-31T19:30:55Z

Backported to v2.x as 1c6b59a

tacaswell · 2016-10-31T21:18:23Z

@DrNightmare Thanks, hopefully we will hear from you again!

DrNightmare · 2016-10-31T21:21:13Z

@tacaswell Sure :)

NelleV · 2016-10-31T21:32:41Z

🍾

phobson · 2016-10-31T21:34:24Z

@DrNightmare the cool thing about about contributing to the docs is that the benefits are (nearly) immediately available:
http://matplotlib.org/devdocs/examples/statistics/customized_violin_demo.html

Updated example as per suggestions in issue matplotlib#7251

08db92d

QuLogic reviewed Oct 29, 2016

View reviewed changes

Fixed ordering of parameters to np.clip()

becbfa4

story645 reviewed Oct 30, 2016

View reviewed changes

Made variable names more meaningful

14cd6cf

QuLogic reviewed Oct 30, 2016

View reviewed changes

Replaced use of zip with call to transpose

03946d4

story645 reviewed Oct 30, 2016

View reviewed changes

DrNightmare added 3 commits October 30, 2016 21:57

Changed variable 'tmp' to 'quartiles' and slightly altered transpose …

8044e9d

…calculation

Fixed recalculation of interquartile ranges

88a8c2e

Replaced plot() calls with scatter() and vlines() calls

62c1716

story645 reviewed Oct 30, 2016

View reviewed changes

Removed unused variable and comments

9f4bbdd

story645 changed the title ~~Updated violin plot example as per suggestions in issue #7251~~ Updated violin plot example as per suggestions in issue #7251 [MRG+1] Oct 31, 2016

story645 reviewed Oct 31, 2016

View reviewed changes

Removed unnecessary comments

b3ce73b

NelleV approved these changes Oct 31, 2016

View reviewed changes

NelleV changed the title ~~Updated violin plot example as per suggestions in issue #7251 [MRG+1]~~ [MRG+2] Updated violin plot example as per suggestions in issue #7251 Oct 31, 2016

phobson merged commit f5c1f00 into matplotlib:master Oct 31, 2016

phobson changed the title ~~[MRG+2] Updated violin plot example as per suggestions in issue #7251~~ Updated violin plot example as per suggestions in issue #7251 Oct 31, 2016

DrNightmare deleted the violinPlotExample branch October 31, 2016 19:18

phobson added this to the 2.0.1 (next bug fix release) milestone Oct 31, 2016

phobson self-assigned this Oct 31, 2016

phobson added Documentation MEP: MEP12 gallery and examples improvements labels Oct 31, 2016

phobson added a commit that referenced this pull request Oct 31, 2016

Merge pull request #7360 from DrNightmare/violinPlotExample

1c6b59a

Updated violin plot example as per suggestions in issue #7251

phobson mentioned this pull request Oct 31, 2016

Improve violin plot demo #7251

Closed

3 tasks

QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0 (style change major release) Dec 7, 2016

Uh oh!

Conversation

DrNightmare commented Oct 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645 Oct 30, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NelleV commented Oct 30, 2016

Uh oh!

DrNightmare commented Oct 30, 2016

Uh oh!

NelleV commented Oct 30, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrNightmare Oct 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645 commented Oct 31, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

story645 Oct 30, 2016 •

edited

Loading

DrNightmare Oct 31, 2016 •

edited

Loading

DrNightmare Oct 31, 2016 •

edited

Loading