ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded#15765
Merged
TravisEz13 merged 5 commits intoPowerShell:masterfrom Aug 4, 2021
Merged
Conversation
Fix for issue PowerShell#9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
iSazonov
reviewed
Jul 13, 2021
Collaborator
iSazonov
left a comment
There was a problem hiding this comment.
Please add a test with new line chars.
src/Microsoft.PowerShell.Commands.Utility/commands/utility/CsvCommands.cs
Outdated
Show resolved
Hide resolved
Added additional tests for different character scenarios that should be handled
Contributor
Author
|
Re: Testing - Added additional tests for detecting delimiters, newlines and quotes. I did put the test objects inside the |
Cast string as ReadOnlySpan to avoid allocation of char array
iSazonov
reviewed
Jul 14, 2021
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
test/powershell/Modules/Microsoft.PowerShell.Utility/ConvertTo-Csv.Tests.ps1
Show resolved
Hide resolved
Co-authored-by: Ilya <darpa@yandex.ru>
iSazonov
approved these changes
Jul 15, 2021
|
This pull request has been automatically marked as Review Needed because it has been there has not been any activity for 7 days. |
TravisEz13
approved these changes
Aug 4, 2021
ConvertTo-Csv: Quote fields with quotes and newlines when using -UseQuotes AsNeeded
|
🎉 Handy links: |
|
🎉 Handy links: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix for issue #9284 - Escape fields that contain quotes and newlines, not just those that contain the delimiter
PR Summary
This PR fixes the case of CSV fields that include double quotes or newlines in their values. If a Column value contains
",\r,\nor the delimiter then it will be marked as needing to be surrounded by quotes.This follows RFC-4180 rules as to what characters trigger a field to be quoted (except it allows for different delimiters than the comma). This also follows how
ConvertFrom-Csvparses input CSV files.PR Context
-UseQuotes AsNeededcurrently only looks for delimiters, which means that the fieldHello,Worldwill be surrounded in quotes, however the fieldHello"Worldwould not be surrounded in quotes, causing this kind of embarassing slip-up:Newlines are another risk, since they're interpreted as completely different rows.
Performance Caveat: This version is searching for the existence of 4 characters (using
string.IndexOfAny(chars)instead of the original 1 (string.Contains(char). Based on my benchmarks this change does have a miniscule increase in time for that single operation. However, in my naïve in-memory benchmarks against a 100MB CSV file, using-UseQuotes AsNeededwas slightly faster than-UseQuotes Alwaysor-QuoteFields. My guess is that I didn't need to allocate as much memory for those unneeded quotes.PR Checklist
.h,.cpp,.cs,.ps1and.psm1files have the correct copyright headerWIP:or[ WIP ]to the beginning of the title (theWIPbot will keep its status check atPendingwhile the prefix is present) and remove the prefix when the PR is ready.(which runs in a different PS Host).