Hide old docs from search engines via canonical link by JazzTap · Pull Request #24 · matplotlib/matplotlib.github.com

JazzTap · 2018-07-15T04:32:30Z

Project initiated with @JLegs to point search engines (and users, gently) at current docs. Dumb approach used: delete version string from url (and put absolute link to matplotlib.org to avoid baseurl shenanigans).

All HTML parsed through lxml by 'tools/docs_deprecator' notebook or script. Only change expected besides whitespace, and property ordering, is 1) a <link> at the bottom of <head> and 2) a <div> at the top of <body>. (The bot-forwarder and human-forwarder respectively.)

Corresponding comment in issue tracker: matplotlib/matplotlib#10016 (comment)

Note that in an ideal world we'd forward pages using a database of pages & their descendants, replacements, whatever. Their automatic computation is compute-heavy, as discussed above.

QuLogic · 2018-07-15T04:37:47Z

There are a ton of extraneous changes here; is there a way to get it to only do the two things you mentioned? It's not just whitespace changes that are added.

JazzTap · 2018-07-15T14:19:43Z

That's lxml snapping all the docs to its grammar. But I didn't spot anything else beyond property re-ordering. Are there semantic changes?

As I understand it, all html is (was) machine-generated from source in the first place. But instead of parsing, one could regex carefully for the lines of form </head> <body> as the point of insertion.

tacaswell · 2018-07-15T14:40:45Z

There appears to be a way to get git to not add whitespace only changes (https://stackoverflow.com/questions/3515597/add-only-non-whitespace-changes).

We should hold of on worry about the whitespace for now, @JazzTap and I are at the scipy sprints and agreed in person to focus on using the cleanup.py script to also add these changes to the files at the top level of the domain first.

Carreau · 2019-07-29T20:19:36Z

See #39 that only change a single line per file.

jklymak · 2021-02-02T23:48:24Z

I'll close this in lieu of #49 which does the same thing almost. Thanks a lot for the work on this though - it was very helpful.

JazzTap added 3 commits July 14, 2018 22:43

initial munge

3fb62e1

surface 'target' to which old page should forward.

c19392b

Deprecator prefers to munge one site-version at a time.

a0b0973

jklymak closed this Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hide old docs from search engines via canonical link#24

Hide old docs from search engines via canonical link#24
JazzTap wants to merge 3 commits intomatplotlib:masterfrom
JazzTap:rel-canonical

JazzTap commented Jul 15, 2018

Uh oh!

QuLogic commented Jul 15, 2018

Uh oh!

JazzTap commented Jul 15, 2018

Uh oh!

tacaswell commented Jul 15, 2018

Uh oh!

Carreau commented Jul 29, 2019

Uh oh!

jklymak commented Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

JazzTap commented Jul 15, 2018

Uh oh!

QuLogic commented Jul 15, 2018

Uh oh!

JazzTap commented Jul 15, 2018

Uh oh!

tacaswell commented Jul 15, 2018

Uh oh!

Carreau commented Jul 29, 2019

Uh oh!

jklymak commented Feb 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants