Remove some legacy IA-related solr fields

### Problem



`ia_box_id`
`ia_loaded_id`
`ia_collections`
`ia_count`

are all solr indexed fields in https://github.com/internetarchive/openlibrary/blob/00b7654b092616ab10c5415923210473605d9635/conf/solr/conf/managed-schema.xml#L156-L159

It is not clear _why_ 

I know what `ia_box_id` is, and it's not appropriate for OL to index due to separation of concerns. It used to be used for something, and that use has been refactored away. Solr is still indexing it.

I don't even know the history of `ia_loaded_id` and `ia_count` but they do not appear to be used.

`ia_collections` is indexing a ridiculous amount of data for no purpose. The `fav_` collections were removed because they were obviously pointless ...... but the majority of collections that are listed are still of no use.

One random example:
https://openlibrary.org/search.json?q=cats+AND+ia:*&mode=everything&fields=title,ia,availability,ia_collection&limit=1

has many `-ol` entries and a `library-of-atlantis`.

_I_ know what these are in terms of archive.org, but they have not purpose on OL. 

They are particularly a problem because they are not even archive.org 'collections' (they are 'simplelists', but that has zero relevance to OL). If there was a clear usecase for _why_ we needed collections, we could adjust, but I don't believe there is one. The original use was to determine lending status for the 'borrow' buttons, which is no longer appropriate.  @cdrini mentioned using collections to filter the book explorer, but that just happens to work in some cases as an un-documented feature, and the majority of these collections are actively inappropriate.

I'm concerned about all this because we are having ongoing performance issues with Solr and with connectivity between OL and archive.org. Requesting this data from archive.org on Solr re-index seems completely unnecessary. 

It seems like there may be some justification to get and index an item's _likely_ availability, but AFAICT we make requests to the availability API  when it's a matter of displaying the actual status anyway, and in terms to search filtering, I can't make much sense of what the expectations are, nor what the current system is achieving with the data is stores. 

`ebook_access:[borrowable TO *]`  appears to be _exactly_ equivalent to `has_fulltext=true` but these are acquired and stored differently?

`NOT public_scan_b:false` is used in multiple carousel queries, but I can only see 2 records that have `public_scan_b` populated: https://openlibrary.org/search.json?q=public_scan_b:true&fields=title,ia,availability,ia_collection,public_scan_b

The data fetched by solr 
https://github.com/internetarchive/openlibrary/blob/00b7654b092616ab10c5415923210473605d9635/openlibrary/book_providers.py#L151-L154
also includes `access_restricted_item` which _might_ be useful, but I can't see where it is stored or used later. I think this is checked real time via the availability API?

There is very little documentation around all this, and I'm not really sure how any of the current data models fit expected usecases around item availability. The model is not clear on the differences between an Open Library book record and individual borrowable / readable etc items.

Open Library has book pages, and book pages have a 'call-to-action' button and most of the magic happens via archive.org's availability API, and I'm not sure where that is documented, yet we have a _lot_ of extra stuff indexed in Solr, which is only updated on record edits.

I imagine there are two main use-cases:
  * Discoverability, search / browse, where a patron is trying to locate or browse using 'availability' as part of the criteria, so indexing relevant aspects would be appropriate
  * Utilisation of a particular item -- the patron has identified a record and wants accurate and current information on how to access an item linked with the record.

The first suggests categorising records by some aspect would be good, but this should be driven by thought out usecases. 

The second is where API calls come in, I'm not sure if availability API requests are firing for every item listed in a multi-page search query. I don't know where to look though. I hope OL isn't doing that, but I can't be sure.

From what I see, we have multiple redundant layers of historical attempts to satisfy unspecified and conflated usecases using Solr, and in most cases we fall back to sending availability API requests because we know the Solr values can't be trusted.

That's almost the worst of both worlds; redundant indexing that is not useful, and expensive real-time API requests that may or may not be useful.

The current code is poorly documented, isn't traceable back to supporting core usecases, and has clearly out-of-place and date tech debt like `ia_box_id` which makes the rest of the code suspect, and much harder to follow.




### Reproducing the bug


1. Go to ...
2. Do ...

* Expected behavior:
* Actual behavior:


### Context

- Browser (Chrome, Safari, Firefox, etc):
- OS (Windows, Mac, etc):
- Logged in (Y/N):
- Environment (prod, dev, local): prod


### Breakdown



#### Requirements Checklist
* [ ]

#### Related files

*

#### Stakeholders

* @cdrini 
<hr>

#### Instructions for Contributors


- Please [run these commands](https://github.com/internetarchive/openlibrary/wiki/Git-Cheat-Sheet#working-on-your-branch) to ensure your repository is up to date **before** [creating a new branch](https://github.com/internetarchive/openlibrary/wiki/Git-Cheat-Sheet#making-changes-and-creating-a-pull-request) to work on this issue and **each time after** pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.

	<field name="ia_box_id" type="string" multiValued="true"/>
	<field name="ia_loaded_id" type="string" multiValued="true"/>
	<field name="ia_count" type="pint"/>
	<field name="ia_collection" type="string" multiValued="true" />

	class IALiteMetadata(TypedDict):
	boxid: set[str]
	collection: set[str]
	access_restricted_item: Literal['true', 'false'] \| None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove some legacy IA-related solr fields #11586

Problem

Reproducing the bug

Context

Breakdown

Requirements Checklist

Related files

Stakeholders

Instructions for Contributors

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Remove some legacy IA-related solr fields #11586

Description

Problem

Reproducing the bug

Context

Breakdown

Requirements Checklist

Related files

Stakeholders

Instructions for Contributors

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions