X Tutup
Skip to content

[WIP] Add efficient ASCII to float/double parser#420

Merged
syoyo merged 32 commits intoreleasefrom
copilot/add-ascii-to-float-parser
Mar 10, 2026
Merged

[WIP] Add efficient ASCII to float/double parser#420
syoyo merged 32 commits intoreleasefrom
copilot/add-ascii-to-float-parser

Conversation

Copy link
Contributor

Copilot AI commented Mar 4, 2026

  • Re-establish plan for Mariana-trench review pass
  • Re-check latest branch state and changed files for regressions
  • Investigate latest CI/workflow runs and job logs
  • Run baseline tests before making any new change
  • Perform deepest targeted code review for safety/correctness edge cases
  • Apply minimal fix only if a concrete issue is confirmed
  • Re-run validation tests
  • Run code review and CodeQL checks
  • Reply to latest PR comment with result and commit hash

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 11 commits March 3, 2026 17:05
…oading

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Replace safeGetline + raw const char* token pointer scanning with a
StreamReader class that wraps (buf, len, idx) with bounds-checked
peek/advance/match operations. Every byte access is guarded by an EOF
check.

Key changes:
- Add StreamReader class for bounds-checked text parsing
- Add sr_* parse functions that use StreamReader instead of const char**
- Add LoadMtlInternal, LoadObjInternal, LoadObjWithCallbackInternal
  using StreamReader; public APIs wrap these
- mmap paths construct StreamReader directly from mapped memory
- Remove safeGetline (no longer needed)
- Keep old parse functions for backward compat with public
  ParseTextureNameAndOption API
- Keep IS_SPACE, IS_NEW_LINE, IS_DIGIT macros for old parse functions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ked parsing

- Add StreamReader class wrapping (buf, length) with EOF-guarded peek/advance/match
- Add sr_* parse functions using StreamReader instead of const char** pointers
- Add LoadMtlInternal, LoadObjInternal, LoadObjWithCallbackInternal taking StreamReader
- mmap paths construct StreamReader directly from mapped memory
- Remove safeGetline (no longer needed)
- Keep old parse functions + macros for ParseTextureNameAndOption backward compat

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ent, BOM in callback, test fixes

- Add UTF8ToWchar/LongPathW helpers for Windows long path and Unicode support
- Add Windows long path support (#ifdef _WIN32) in non-mmap ifstream fallbacks
- Add vt_w vector and optional w component parsing in LoadObjInternal
- Fix texcoord_ws.swap(vt) -> texcoord_ws.swap(vt_w) corruption
- Add BOM stripping to LoadObjWithCallbackInternal
- Fix invalid-relative-texture-vertex-index.obj -> invalid-relative-texture-index.obj
- Add test models: texcoord-w.obj, texcoord-w-mixed.obj, utf8-path-test.obj/.mtl
- Add tests: texcoord_w, texcoord_w_mixed, utf8_path, long_path, callback_BOM

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
…lease

Keep StreamReader-based parsing from the harden branch for all conflicting
regions: map_d parsing, mmap material loading, LoadObjInternal, texcoord
w-component, BOM handling, and main parse loop. Retain mmap/memory-buffer
tests in tester.cc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add column tracking (col_num_) to StreamReader, updated in all
  position-advancing methods (get, advance, skip_space, etc.)
- Add current_line_text(), format_error(filename, msg) for clang-style
  diagnostics (file:line:col with source line and caret)
- Add error stack to StreamReader (push_error, has_errors, get_errors)
- Add error-reporting sr_parse overloads that return bool for
  success/failure: sr_parseInt, sr_parseReal, sr_parseReal2/3,
  sr_parseVertexWithColor (returns -1 on error)
- Wire error-reporting overloads into LoadMtlInternal (Ka, Kd, Ks, Ni,
  Ns, illum, d, Tr, PBR properties) with early return on parse failure;
  change LoadMtlInternal from void to bool
- Wire error-reporting overloads into LoadObjInternal (v, vn, vt, vw)
  with early return on parse failure
- Upgrade existing error messages (face, line, points, skin weight) to
  use sr.format_error() for consistent clang-style output
- Pass filename through call chain: LoadObj -> LoadObjInternal,
  MaterialFileReader -> LoadMtlInternal
- Add warning_context.filename for fixIndex warnings
- Add 6 new tests: column tracking, format_error output, error stack,
  malformed vertex/mtl errors, backward compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move warning_context initialization (warn, filename) before the main
  parse loop in LoadObjInternal; only update line_number per iteration.
  Avoids a std::string copy of filename on every face/line/points line.
- Replace test_parse_error_backward_compat (was just TEST_CHECK(true))
  with an actual round-trip test: parse valid OBJ, verify no errors and
  correct vertex count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@syoyo
Copy link
Collaborator

syoyo commented Mar 4, 2026

@copilot rerun

Copilot AI and others added 12 commits March 4, 2026 16:06
Port fast_float v8.0.2 (C++11 compatible) as the parsing backend and add
an OBJ-specific wrapper (obj_parse_float.h) that handles:
- Leading '+' sign (e.g. "+1.0", "+.5e3")
- Case-insensitive nan/inf/infinity parsing
- NumPy-style non-finite replacement (nan→0, inf→max, -inf→lowest)
  with user-configurable ParseOptions

Includes standalone test/benchmark in sandbox/parse_fp/ covering
correctness, accuracy vs strtod, and performance comparison.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add minimal shim (tinyobj_ff namespace) providing type traits, algorithms,
fixed-width integers, and error codes so the embedded fast_float requires
zero additional #include dependencies beyond what tinyobjloader already uses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update Makefiles to use -std=c++11, update README to reflect C++11
requirement, and simplify TINYOBJ_OVERRIDE macro.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three tests covering float parsing: edge cases (leading/trailing dots,
scientific notation, leading plus, leading zeros, negative zero, high
precision, near-overflow, subnormals), nan/inf variants, and stream-based
loading. All pass with both fast_float and legacy parsers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add null checks for s, s_end, and result parameters. Parse into a
local temporary so *result is only written on success, preventing
partial writes on overflow or invalid input.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test overflow preserving default values, empty/whitespace input,
garbage input, and extreme precision digits. All pass with both
fast_float and legacy parsers under ASan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts by keeping both numeric edge case tests and
StreamReader/error reporting tests. Adapt overflow and garbage input
tests to match StreamReader's strict error detection behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…jloader/tinyobjloader into copilot/add-ascii-to-float-parser
Resolve conflicts by taking release's improved safety/error handling
(MappedFile RAII, stream size limits, overflow-safe integer parsing,
sr.has_errors() pattern) while preserving HEAD's unique additions
(embedded fast_float parser, numeric edge case tests, C++11 requirement).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… to callback API, add comprehensive tests

- Add size_to_int() helper to safely clamp size_t to INT_MAX, preventing
  undefined behavior when vertex arrays exceed ~715M elements
- Switch callback API (LoadObjWithCallback) to use error-reporting overload
  of sr_parseVertexWithColor, so malformed vertices are detected instead of
  silently using defaults
- Fix StreamReader underflow in match()/char_at()/peek_at() bounds checks
- Delete StreamReader copy/move constructors (raw pointer into owned buffer)
- Add #undef for all 38 FASTFLOAT_* macros to prevent namespace pollution
- Replace fragile integer type shims with standard <cstdint>/<cfloat>/<climits>
- Fix LoadMtlInternal: warn_ss append (= to +=), phantom empty material guard
- Fix MappedFile: reset size on POSIX mmap failure
- Add bounds-check validation in earcut/quad triangulation paths
- Remove duplicate #include merge artifact
- Add 33 new unit tests covering StreamReader, edge cases, MTL, callbacks,
  and parser robustness

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ndling

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copilot finished work on behalf of syoyo March 7, 2026 23:20
@syoyo
Copy link
Collaborator

syoyo commented Mar 8, 2026

@copilot review. deeper

…seDouble, fix test assertion and CI workflow

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 8, 2026

@copilot review. deeper

Deeper review found and fixed four issues in commit 0fbf7ac:

  1. Build artifacts committedTesting/Temporary/CTestCostData.txt and Testing/Temporary/LastTest.log were accidentally tracked in git (despite .gitignore covering them). Both files have been removed.

  2. Inconsistent allow_leading_plus usagetryParseDouble in tiny_obj_loader.h still had manual if (*parse_start == '+') ++parse_start code. Applied the same chars_format::allow_leading_plus fix as was done for sandbox/parse_fp/obj_parse_float.h.

  3. Incomplete test assertiontest_split_string_edge_cases verified tokens.size() == 1 for the trailing-backslash case but never checked the token value. Added TEST_CHECK(tokens[0] == "dir\\") to confirm the backslash is preserved as-is.

  4. CI workflow.github/workflows/unit.yml used make && ./tester instead of the canonical make check target defined in tests/Makefile.

Copilot finished work on behalf of syoyo March 8, 2026 15:06
@syoyo
Copy link
Collaborator

syoyo commented Mar 8, 2026

@copilot review. higher

…its), add C++11 error directive, document fast_float in README

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 8, 2026

@copilot review. higher

Higher-level review found and fixed three architectural issues in commit 866b9bc:

  1. TINYOBJLOADER_DISABLE_FAST_FLOAT build break<climits> was only included inside #ifndef TINYOBJLOADER_DISABLE_FAST_FLOAT, but size_to_int() (which uses INT_MAX) lives outside that guard. Building with TINYOBJLOADER_DISABLE_FAST_FLOAT failed with 'INT_MAX' was not declared. Moved #include <climits> to the unconditional implementation block.

  2. C++11 not enforced at compile time — The comment "C++11 is now the minimum required standard" was never backed by a hard check. Added a #error for __cplusplus < 201103L so users on older standards get a clear diagnostic instead of confusing template errors.

  3. TINYOBJLOADER_DISABLE_FAST_FLOAT undocumented — The opt-out flag existed only as a comment in the header. Added a "High-performance float parsing (fast_float)" section to README explaining the feature, how to opt out, and when opting out is necessary (name collision with an external fast_float include).

Other architectural observations that do not require changes:

  • fast_float and tinyobj_ff namespaces live at global scope (outside namespace tinyobj), which is correct — they're accessed via fully-qualified names from within namespace tinyobj.
  • The macro #undef block after the fast_float embed correctly avoids polluting the user's preprocessor namespace.
  • The legacy fallback (DISABLE_FAST_FLOAT) duplicates parseReal/parseReal2/parseReal3 — this is the necessary cost of supporting the opt-out.

Copilot finished work on behalf of syoyo March 8, 2026 16:53
@syoyo
Copy link
Collaborator

syoyo commented Mar 8, 2026

@copilot review. ultrathink

…mment, add direct system_error include in sandbox

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 8, 2026

@copilot review. ultrathink

Ultrathink review complete. Three additional fixes in commit 3eaa62d:

  1. <cstdint> now in unconditional implementation includes — Previously only inside #ifndef TINYOBJLOADER_DISABLE_FAST_FLOAT. The only current uint32_t usage in namespace tinyobj outside that block was guarded by #ifdef TINYOBJLOADER_USE_MAPBOX_EARCUT (which transitively includes <cstdint>), so it worked — but silently. Any future addition of fixed-width integer types outside those guards would break only the DISABLE_FAST_FLOAT build. Now explicit and robust.

  2. Stale Makefile comment removed — The commented-out -DTINYOBJLOADER_USE_FAST_FLOAT=1 -I../fast_float/include referenced the old external opt-in approach (before fast_float was bundled). That flag no longer exists; the new opt-out is TINYOBJLOADER_DISABLE_FAST_FLOAT. Replaced with a note explaining the current approach.

  3. Direct <system_error> include in sandbox/parse_fp/obj_parse_float.h — The file compares r.ec == std::errc() but included <system_error> only transitively through fast_float.h. Added the direct include to make the dependency explicit.

Additional findings examined and determined to be pre-existing (not introduced by this PR):

  • Three const char**-based parse functions (parseRawTriple, parseTagTriple, parseTriple) are now dead code since the SR-based equivalents took over — but they predate this PR
  • fixIndex doesn't bounds-check positive vertex indices (e.g. f 1 2 999 against a 3-vertex mesh stores index 998 without a warning) — pre-existing behavior, tested/documented by test_out_of_range_face_index
  • system() in RemoveTestDir test helper: acceptable for test code with hardcoded paths

Copilot finished work on behalf of syoyo March 8, 2026 17:12
@syoyo
Copy link
Collaborator

syoyo commented Mar 8, 2026

@copilot continue review. Ultra super think.

… write

Co-authored-by: syoyo <18676+syoyo@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 8, 2026

@copilot continue review. Ultra super think.

Ultra-super-think pass completed. I found and fixed one concrete memory-safety issue in commit 26ff0a8.

tests/tester.cc had a Windows-only helper bug in WcharToUTF8: it allocated len - 1 bytes but called WideCharToMultiByte(..., -1, ..., len), where len includes the terminating NUL. That could write one byte past the buffer. I changed it to allocate len, perform conversion, then resize(len - 1) to trim the terminator.

Also re-checked CI/workflow status and logs for this branch (Unit Tests latest run is passing), then re-ran local make check, plus code review and CodeQL scan (no new alerts).

Copilot finished work on behalf of syoyo March 8, 2026 22:19
@syoyo
Copy link
Collaborator

syoyo commented Mar 9, 2026

@copilot review, again. deepest like Mariana Trench

Copy link
Contributor Author

Copilot AI commented Mar 9, 2026

@copilot review, again. deepest like Mariana Trench

Completed another Mariana-trench-depth pass.

I re-checked latest CI/workflow state and logs, re-ran local tests/make check, and re-audited the recently touched parsing + Windows helper areas. I did not find any additional concrete defect requiring code changes beyond 26ff0a8.

Current latest fix remains 26ff0a8 (Windows UTF conversion buffer sizing safety fix).

Copilot finished work on behalf of syoyo March 9, 2026 15:07
@syoyo syoyo merged commit 966edce into release Mar 10, 2026
31 checks passed
@syoyo syoyo deleted the copilot/add-ascii-to-float-parser branch March 10, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

X Tutup