X Tutup
Skip to content
This repository was archived by the owner on Mar 23, 2026. It is now read-only.

fix(transcribe): allow exact 4h duration and handle invalid metadata#13561

Merged
purcell merged 3 commits intolocalstack:mainfrom
kelvinvelasquez-SDE:fix/transcribe-duration
Feb 9, 2026
Merged

fix(transcribe): allow exact 4h duration and handle invalid metadata#13561
purcell merged 3 commits intolocalstack:mainfrom
kelvinvelasquez-SDE:fix/transcribe-duration

Conversation

@kelvinvelasquez-SDE
Copy link
Contributor

Description

Fixes #12423

This PR addresses an issue where audio files of exactly 4 hours (14400 seconds) were rejected by the TranscribeService, despite AWS allowing files up to 4 hours. It also adds robustness to handle cases where ffprobe returns invalid duration metadata (e.g., N/A), preventing the service from crashing.

Changes

  • Modified provider.py: Changed duration check from >= to > to include 14400.0s as valid.
  • Added Error Handling: Wrapped the duration parsing logic in a try/except block to gracefully handle ValueError or TypeError, defaulting to assuming validity (or 0.0s) and logging a warning instead of raising a 500 Internal Server Error.
  • New Tests: Added tests/unit/services/transcribe/test_provider.py with regression tests covering:
    • Duration > 4h (Fails as expected)
    • Duration = 4h (Passes)
    • Duration = N/A (Handled without crash)

Verification

Ran the new unit tests locally and they pass. Verified that the service no longer throws unhandled exceptions for bad metadata.

@localstack-bot
Copy link
Contributor

localstack-bot commented Dec 22, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

Copy link
Contributor

@localstack-bot localstack-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to LocalStack! Thanks for raising your first Pull Request and landing in your contributions. Our team will reach out with any reviews or feedbacks that we have shortly. We recommend joining our Slack Community and share your PR on the #community channel to share your contributions with us. Please make sure you are following our contributing guidelines and our Code of Conduct.

@kelvinvelasquez-SDE
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Contributor

@purcell purcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to analyse, write and submit this PR. 🙏

I'll defer to the maintainers of this service for a more thorough review, but my quick feedback is that while the small fix itself may be worthwhile, the tests added here are not consistent with those elsewhere in the codebase, nor would they be maintainable in the long term. The scenarios should be possible to recreate with real sample files (without them becoming enormous), so it would be better to extend the more concrete tests instead.

Comment on lines +325 to +332
# If duration cannot be parsed, we assume it's invalid or fallback to letting it run
# But for strictness let's fail or default?
# Best practice: if we can't determine it, we can't validate it.
# Use a safe fallback or log/raise.
# Given user report of "N/A", let's fail to be safe as per AWS strictness?
# Or maybe default to fail if we assume it's a stream?
# Let's log and re-raise or handle.
# Simplest fix: treat N/A as failure?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment contains lots of questions and doesn't explain anything clearly, so it adds little to the reader's understanding of the code.

@purcell purcell added semver: patch Non-breaking changes which can be included in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Dec 22, 2025
@alexrashed alexrashed added this to the Playground milestone Jan 23, 2026
@purcell
Copy link
Contributor

purcell commented Feb 6, 2026

(Lint step apparently failing due to the issue resolved by #13710.)

@purcell purcell force-pushed the fix/transcribe-duration branch from c88068b to b50410a Compare February 6, 2026 17:54
@purcell purcell force-pushed the fix/transcribe-duration branch from b50410a to b447580 Compare February 9, 2026 11:34
Copy link
Contributor

@purcell purcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is (now) just a small follow-up adjustment to #12450 by @brunodmartins last March, which already resolved #12423, and was merged by @k-a-il. This should be uncontroversial to merge now.

@kelvinvelasquez-SDE: the initial version of this PR was clearly written with the assistance of an LLM, and was quite messy, but you'll note that the end result here was a trivial two-line change, which would have been trivial to review and merge immediately. May I suggest in future emphasising minimal changes, asking maintainers' opinions, and iterating with them in conversation to come up with a good solution? This would be more satisfying for everyone. From an OSS maintainer's perspective, we want to connect with other humans and build things together, so communication is key.

@purcell purcell merged commit 52d8381 into localstack:main Feb 9, 2026
36 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: patch Non-breaking changes which can be included in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: TranscribeService should fail for audio sizes exceeding 4 hours.

4 participants

X Tutup