Fix diff patch parser for paths with unsafe chars#415
Merged
Byron merged 2 commits intogitpython-developers:masterfrom Apr 20, 2016
nvie:fix-for-unicode-paths
Merged
Fix diff patch parser for paths with unsafe chars#415Byron merged 2 commits intogitpython-developers:masterfrom nvie:fix-for-unicode-paths
Byron merged 2 commits intogitpython-developers:masterfrom
nvie:fix-for-unicode-paths
Conversation
This specifically covers the cases where unsafe chars occur in path
names, and git-diff -p will escape those.
From the git-diff-tree manpage:
> 3. TAB, LF, double quote and backslash characters in pathnames are
> represented as \t, \n, \" and \\, respectively. If there is need
> for such substitution then the whole pathname is put in double
> quotes.
This patch checks whether or not this has happened and will unescape
those paths accordingly.
One thing to note here is that, depending on the position in the patch
format, those paths may be prefixed with an a/ or b/. I've specifically
made sure to never interpret a path that actually starts with a/ or b/
incorrectly.
Example of that subtlety below. Here, the actual file path is
"b/normal". On the diff file that gets encoded as "b/b/normal".
diff --git a/b/normal b/b/normal
new file mode 100644
index 0000000000000000000000000000000000000000..eaf5f7510320b6a327fb308379de2f94d8859a54
--- /dev/null
+++ b/b/normal
@@ -0,0 +1 @@
+dummy content
Here, we prefer the "---" and "+++" lines' values. Note that these
paths start with a/ or b/. The only exception is the value "/dev/null",
which is handled as a special case.
Suppose now the file gets moved "b/moved", the output of that diff would
then be this:
diff --git a/b/normal b/b/moved
similarity index 100%
rename from b/normal
rename to b/moved
We prefer the "rename" lines' values in this case (the "diff" line is
always a last resort). Take note that those lines are not prefixed with
a/ or b/, but the ones in the "diff" line are (just like the ones in
"---" or "+++" lines).
Specifically "string_escape" does not exist as an encoding anymore.
Member
|
Thank @nvie , I particularly like your test ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This specifically covers the cases where unsafe chars occur in path names, and git-diff -p will escape those.
From the git-diff-tree manpage:
This patch checks whether or not this has happened and will unescape those paths accordingly.
One thing to note here is that, depending on the position in the patch format, those paths may be prefixed with an
a/orb/. I've specifically made sure to never interpret a path that actually starts witha/orb/incorrectly.Example of that subtlety below. Here, the actual file path is
b/normal. On the diff file that gets encoded asb/b/normal.Here, we prefer the
---and+++lines' values. Note that these paths start witha/orb/. The only exception is the value/dev/null, which is handled as a special case.Suppose now the file gets moved
b/moved, the output of that diff would then be this:We prefer the
renamelines' values in this case (thediffline is always a last resort). Take note that those lines are not prefixed witha/orb/, but the ones in thediffline are (just like the ones in---or+++lines).