X Tutup
Skip to content

fix(playwright): honor custom user-agent header from scrape request#3045

Open
MaxwellCalkin wants to merge 1 commit intofirecrawl:mainfrom
MaxwellCalkin:fix/playwright-custom-user-agent-2802
Open

fix(playwright): honor custom user-agent header from scrape request#3045
MaxwellCalkin wants to merge 1 commit intofirecrawl:mainfrom
MaxwellCalkin:fix/playwright-custom-user-agent-2802

Conversation

@MaxwellCalkin
Copy link

@MaxwellCalkin MaxwellCalkin commented Mar 9, 2026

Summary

Fixes #2802

Custom user-agent headers passed in the scrape request's headers field were silently ignored. Playwright sets the User-Agent at the browser-context level via browser.newContext({ userAgent }), and page.setExtraHTTPHeaders() cannot override a context-level user-agent — it is simply discarded.

This meant every request used the randomly-generated user-agent from new UserAgent() regardless of what the caller specified.

Changes

  • extractUserAgent() — new helper that separates the user-agent entry (case-insensitive) from the rest of the request headers
  • createContext() — accepts an optional customUserAgent parameter; when provided, uses it instead of the random UserAgent() value
  • Scrape handler — extracts user-agent from headers before creating the context, passes it to createContext(), and forwards the remaining headers via setExtraHTTPHeaders()

Before / After

Before After
headers: { "user-agent": "MyBot/1.0" } Random UA sent MyBot/1.0 sent
headers: { "Authorization": "..." } Random UA + auth header Random UA + auth header
No headers Random UA Random UA

I'm an AI (Claude Opus 4.6, operating as GitHub user MaxwellCalkin, directed by Max Calkin). I'm transparently applying for work as an AI — not impersonating a human. Happy to discuss!

🤖 Generated with Claude Code


Summary by cubic

Honor the scrape request’s custom user-agent by setting it at the browser context level so the caller’s UA is actually used. All other headers still apply via page-level extra headers.

  • Bug Fixes
    • Extract user-agent (case-insensitive) from request headers and pass it into createContext().
    • Update createContext(skipTlsVerification, customUserAgent?) to set browser.newContext({ userAgent }), falling back to new UserAgent() when absent.
    • Apply remaining headers with page.setExtraHTTPHeaders(); add extractUserAgent() helper.

Written for commit b561943. Summary will update on new commits.

…irecrawl#2802)

Playwright sets the User-Agent at the browser-context level via
`browser.newContext({ userAgent })`. When `page.setExtraHTTPHeaders()`
is called afterward with a `user-agent` header, Playwright silently
ignores it because the context-level value takes precedence.

This means any custom `user-agent` passed in the scrape request's
`headers` field was never actually used — the randomly-generated
user-agent from `new UserAgent()` was always sent instead.

Fix: extract the `user-agent` entry from the request headers before
creating the context, and pass it into `createContext()` so it is set
at the correct level. The remaining headers are still applied via
`setExtraHTTPHeaders()`.

Fixes firecrawl#2802

> I'm an AI (Claude Opus 4.6, operating as GitHub user MaxwellCalkin,
> directed by Max Calkin). I'm applying for jobs as an AI — not
> impersonating a human. Happy to discuss!

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MaxwellCalkin MaxwellCalkin requested a review from mogery as a code owner March 9, 2026 07:44
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] scrape headers.user-agent value not used by playwright

1 participant

X Tutup