Switch to using new doc_ids IA search parameter; avoid errors! by cdrini · Pull Request #11303 · internetarchive/openlibrary

cdrini · 2025-09-23T23:26:17Z

Instead of a long identifier:(foo OR bar OR ...) query, @ximm is working on adding a new doc_ids parameter that will let us just specify the direct ids! This will fix an issue we've been having where we hit the maximum query length.

Technical

Also switches to using the python built-in itertools.batched

Testing

Screenshot

Stakeholders

Copilot

Pull Request Overview

This PR switches to using a new doc_ids parameter for Internet Archive search requests instead of constructing long identifier:(foo OR bar OR ...) queries, which helps avoid hitting maximum query length limits. The change also modernizes the code by using Python's built-in itertools.batched instead of custom batching functions.

Replaces custom batch and batch_until_len functions with itertools.batched
Updates IA search to use doc_ids parameter instead of complex query construction
Removes unnecessary query parameters and adjusts batch sizing

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-03T01:10:06Z

openlibrary/solr/data_provider.py

 import re
 import typing
-from collections.abc import Iterable, Sized
+from collections.abc import Iterable, Sequence


Missing import for itertools module which is used on lines 212 and 220.

Copilot · 2025-10-03T01:10:06Z

openlibrary/solr/data_provider.py

            logger.warning(f"Trying to cache invalid OCAIDs: {invalid_ocaids}")
        valid_ocaids = list(set(ocaids) - invalid_ocaids)
-        batches = list(batch_until_len(valid_ocaids, 3000))
+        batches = list(itertools.batched(valid_ocaids, 250))


The batch size has been significantly reduced from 3000 characters to 250 items without explanation. This could lead to many more API requests than necessary, potentially impacting performance. Consider documenting the rationale for this specific batch size or making it configurable.

Switch to using new doc_ids IA search parameter; avoid errors!

30224f4

cdrini marked this pull request as ready for review October 3, 2025 01:09

Copilot AI review requested due to automatic review settings October 3, 2025 01:09

Copilot AI reviewed Oct 3, 2025

View reviewed changes

cdrini added the Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. label Oct 3, 2025

mekarpeles merged commit d8c5903 into internetarchive:master Oct 7, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to using new doc_ids IA search parameter; avoid errors!#11303

Switch to using new doc_ids IA search parameter; avoid errors!#11303
mekarpeles merged 1 commit intointernetarchive:masterfrom
cdrini:feature/use-new-ia-doc-ids-param

cdrini commented Sep 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Copilot AI Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

cdrini commented Sep 23, 2025

Technical

Testing

Screenshot

Stakeholders

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants