fix fastapi search.json discrepancies by RayBB · Pull Request #11517 · internetarchive/openlibrary

RayBB · 2025-11-26T07:45:41Z

When we first put out the fastapi search.json we thought passing along all the parameters was good enough for testing.

Turns out there were a few discrepancies with how we were handling lists and pagination.

This PR makes the final changes to have everything be a 1:1 match with the old search endpoint.

Technical

Testing

Lots of tests have been added.

Screenshot

Stakeholders

@cdrini

…ditionally skip legacy WSGI app initialization during pytest.

…rameterized test.

…g a new helper function.

…arameters to the search endpoint, including a new test.

…AllowedParams` to handle underscore prefixes with aliases.

… handled in search queries

cdrini

Greeeeaaattt work!! This is becoming so much tidier 😊

openlibrary/tests/fastapi/test_search.py

openlibrary/fastapi/search.py

…pe ignore for field splitting

RayBB · 2025-11-27T04:51:38Z

For posterity sake, here's the set of "tests" I wrote to track my progress trying to get the pydantic model working with list queries.

Spoiler

##### The tests here are to show that it's hard to get the lists working for query params
def test_multi_key():  # noqa: PLR0915

    app = FastAPI()

    # This doesn't work because it expects the author keys to be in the body
    @app.get("/search.json")
    async def search_works(
        author_key: list[str],
    ):
        return {'author_key': author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 422
    assert response.json() != {'author_key': ['OL1A', 'OL2A']}
    assert response.json()['detail'][0]['type'] == 'missing'
    assert response.json()['detail'][0]['loc'] == ['body']

    # This test does work because we're explicitly using Query but we want it moved into a Pydantic model
    app = FastAPI()

    @app.get("/search.json")
    async def search_works2(
        author_key: Annotated[list[str], Query()],
    ):
        return {'author_key': author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # This test does work because we're explicitly using query but we don't want None
    app = FastAPI()

    @app.get("/search.json")
    async def search_works3(
        author_key: Annotated[list[str] | None, Query()] = None,
    ):
        return {'author_key': author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # This this says body is missing again ok
    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str]

    @app.get("/search.json")
    async def search_works4(
        params: SearchParams,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 422
    assert response.json() != {'author_key': ['OL1A', 'OL2A']}
    assert response.json()['detail'][0]['type'] == 'missing'
    assert response.json()['detail'][0]['loc'] == ['body']

    # Ok so now this works. Yay!
    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str]

    @app.get("/search.json")
    async def search_works5(
        params: Annotated[SearchParams, Query()],
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # But what if there are other params? Uh oh then they're missing...
    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str]

    @app.get("/search.json")
    async def search_works6(
        params: Annotated[SearchParams, Query()],
        q: str | None = None,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 422
    assert response.json()['detail'][0]['type'] == 'missing'
    assert response.json()['detail'][0]['loc'] == ['query', 'params']

    # So Gemini says it'll work if we use Depends instead of query! But then we get a body missing :(
    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str]

    @app.get("/search.json")
    async def search_works7(
        params: Annotated[SearchParams, Depends()],
        q: str | None = None,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 422
    # assert response.json() == {'author_key': ['OL1A', 'OL2A']}
    assert response.json()['detail'][0]['type'] == 'missing'
    assert response.json()['detail'][0]['loc'] == ['body']

    # So what if we make it clearer that it's a query param? Woah that works!
    """
    It seems to work because:
    1. Depends(): Tells FastAPI to explode the Pydantic model into individual arguments (dependency injection).
    2. Field(Query([])): Overrides the default behavior for lists. It forces FastAPI to look for ?author_key=...
       in the URL query string instead of expecting a JSON array in the request body.
    The Field part is needed because FastAPI's default guess for lists inside Pydantic models is wrong for your use case.
       It guesses "JSON Body," and you have to manually correct it to "Query String."
    """
    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str] = Field(Query([]))

    @app.get("/search.json")
    async def search_works8(
        params: Annotated[SearchParams, Depends()],
        q: str | None = None,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # A quick check to make sure it's ok with no params
    response = client.get('/search.json')
    assert response.status_code == 200
    assert response.json() == {'author_key': []}

    # But wait I think doing Query([]) is not great to put a mutable class in the default.
    # However, pydantic said don't worry about it.
    # https://docs.pydantic.dev/latest/concepts/fields/#mutable-default-values
    # Lets try to use the "proper way" just in case
    # And it works great! But it's ugly so lets not do it

    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: list[str] = Field(Query(default_factory=list))

    @app.get("/search.json")
    async def search_works9(
        params: Annotated[SearchParams, Depends()],
        q: str | None = None,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # A quick check to make sure it's ok with no params
    response = client.get('/search.json')
    assert response.status_code == 200
    assert response.json() == {'author_key': []}

    # But wait AI says there's a modern standard of Annotated
    # However, after looking at the fullstack fastapi template https://github.com/fastapi/full-stack-fastapi-template
    # It seems they don't do it so we shouldn't have to either
    # So in summary we should use search_works8 I think!

    app = FastAPI()

    class SearchParams(BaseModel):
        author_key: Annotated[list[str], Field(Query([]))]

    @app.get("/search.json")
    async def search_works10(
        params: Annotated[SearchParams, Depends()],
        q: str | None = None,
    ):
        return {'author_key': params.author_key}

    client = TestClient(app)
    response = client.get('/search.json?author_key=OL1A&author_key=OL2A')
    assert response.status_code == 200
    assert response.json() == {'author_key': ['OL1A', 'OL2A']}

    # A quick check to make sure it's ok with no params
    response = client.get('/search.json')
    assert response.status_code == 200
    assert response.json() == {'author_key': []}

openlibrary/fastapi/search.py

cdrini

This looks great! I tested on testing and everything seemed to work!

Facets worked: https://testing.openlibrary.org/search.json?q=sherlock+holmes&mode=everything&language=ger&fields=key,title,editions,language (editions are in german)
The query json parameter works
Multiple subjects work https://testing.openlibrary.org/search.json?q=sherlock+holmes&mode=everything&language=ger&subject_facet=English+Detective+and+mystery+stories&subject_facet=Fiction&fields=key,title,editions

None of the comments here are blockers, but curious mainly to the question about the unit tests. Otherwise lgtm and will merge up tomorrow!

cdrini · 2025-11-28T00:05:30Z

openlibrary/fastapi/search.py

    """
+    query: dict[str, Any] = {}
+    if query_str:
+        query = json.loads(query_str)


In the previous version, sort/page/etc were also in the ?query={...} parameter

If you mean the previous version of this PR then I think it was just a mistake.
If you mean the web.py version of this endpoint then I'm not sure where you see that can you elaborate?

They certainly can specify those things in the ?query={...} param but we don't really check/validate that as of now. Would be good idea for the future though.

Oh, I mean in the web py version you can specify sort inside the query json object, and it will be used instead of the sort parameter. Here, you can only specify sort as a URL parameter, not in the query Json object. But it's not super high priority, since this feature isn't really used.

openlibrary/tests/fastapi/test_search.py

…ng for author_key

RayBB added 9 commits November 25, 2025 20:59

fix search bugs for params that accept multiple values

951a3b9

support query

a05f5e9

unify pagination

d1a9072

match default and also return offset

a4ea6b6

test: Add comprehensive tests for the FastAPI search endpoint and con…

5490a26

…ditionally skip legacy WSGI app initialization during pytest.

refactor: Consolidate multiple pagination test cases into a single pa…

b8ab1bd

…rameterized test.

refactor: simplify search endpoint test calls by introducing and usin…

af4d6ce

…g a new helper function.

feat: Dynamically allow and pass WorkSearchScheme fields as query p…

d774d7e

…arameters to the search endpoint, including a new test.

feat: Re-enable WorkSearchScheme field name mapping and update `All…

9ee557e

…AllowedParams` to handle underscore prefixes with aliases.

RayBB mentioned this pull request Nov 26, 2025

finalize fastapi search json endpoint #11482

Closed

refactor: remove commented-out facet query parameter

4a93c2b

RayBB changed the title ~~fastapi fix search bugs~~ fix fastapi search.json descrepancies Nov 26, 2025

RayBB added 4 commits November 25, 2025 23:55

use correct pydantic method

e2c1c8e

feat: Add test to ensure multiple author_key parameters are correctly…

55bb97d

… handled in search queries

improve comment

9077f5a

remove extra comment

e80bde7

cdrini self-assigned this Nov 26, 2025

RayBB changed the title ~~fix fastapi search.json descrepancies~~ fix fastapi search.json discrepancies Nov 26, 2025

cleanup pagination

733f91c

cdrini reviewed Nov 26, 2025

View reviewed changes

openlibrary/tests/fastapi/test_search.py Show resolved Hide resolved

openlibrary/fastapi/search.py Outdated Show resolved Hide resolved

openlibrary/fastapi/search.py Outdated Show resolved Hide resolved

RayBB added 10 commits November 26, 2025 11:14

fix limit test

b535e63

sop allowing arbitrary params

5eeb1f5

add a test for limit 0

2ca9220

ensure all params are accepted and add test

ca547d2

move mostqueryparams to a model (not lists)

fb0bc91

Merge branch 'master' into fastapi_fix_search_bugs

9ad7639

add test for check_params

7a334ab

add three tests for lists

3340fef

commit tests that found solution for params issue

0b0b798

last test explorations to best solution

4b2f74b

RayBB added 6 commits November 26, 2025 17:37

a nice pydantic model with multiple value params

9963e2b

move q into publicqueryoptions

b8aebbb

reorder query update

404999e

refactor: explicitly cast default search fields to list and remove ty…

11518e5

…pe ignore for field splitting

fix: change public_scan_b parameter type from list[str] to bool | None

ddde7e9

chore: remove redundant comments in search API response handling.

774926d

RayBB added 2 commits November 26, 2025 20:55

add nice comment and remove excessive learning tests

2b3b166

use model_validator

ff16143

RayBB commented Nov 27, 2025

View reviewed changes

openlibrary/fastapi/search.py Outdated Show resolved Hide resolved

RayBB added 2 commits November 27, 2025 00:24

remove extra json response

9bd2e12

add tests for q parsing

27be8c6

cdrini approved these changes Nov 28, 2025

View reviewed changes

test: simplify search parameter validation by removing special handli…

d8640c7

…ng for author_key

cdrini merged commit d59c2f6 into master Nov 28, 2025
8 checks passed

RayBB mentioned this pull request Dec 4, 2025

Fastapi /search.json not handling multiple facet parameters correctly #11491

Closed

RayBB deleted the fastapi_fix_search_bugs branch February 12, 2026 02:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix fastapi search.json discrepancies#11517

fix fastapi search.json discrepancies#11517
cdrini merged 36 commits intomasterfrom
fastapi_fix_search_bugs

RayBB commented Nov 26, 2025 •

edited

Loading

Uh oh!

cdrini left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RayBB commented Nov 27, 2025

Uh oh!

Uh oh!

cdrini left a comment

Uh oh!

cdrini Nov 28, 2025

Uh oh!

RayBB Nov 28, 2025

Uh oh!

cdrini Nov 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RayBB commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Technical

Testing

Screenshot

Stakeholders

Uh oh!

cdrini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RayBB commented Nov 27, 2025

Uh oh!

Uh oh!

cdrini left a comment

Choose a reason for hiding this comment

Uh oh!

cdrini Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

RayBB Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

cdrini Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RayBB commented Nov 26, 2025 •

edited

Loading