arxiv

arxiv.py

PyPI PyPI - Python Version GitHub Workflow Status (branch) Full package documentation

Python wrapper for the arXiv API.

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Install the package:

$ pip install arxiv   # Or `uv add arxiv` or similar.

In your Python code, include the line:

import arxiv

Examples

Fetching results

import arxiv

# Construct the default API client.
client = Client()

# Search for the 10 most recent articles matching the keyword "quantum."
search = Search(
  query = "quantum",
  max_results = 10,
  sort_by = SortCriterion.SubmittedDate
)

results = client.results(search)

# `results` is a generator; you can iterate over its elements one by one...
for r in client.results(search):
  print(r.title)
# ...or exhaust it into a list. Careful: this is slow for large results sets.
all_results = list(results)
print([r.title for r in all_results])

# For advanced query syntax documentation, see the arXiv API User Manual:
# https://arxiv.org/help/api/user-manual#query_details
search = Search(query = "au:del_maestro AND ti:checkerboard")
first_result = next(client.results(search))
print(first_result)

# Search for the paper with ID "1605.08386v1"
search_by_id = Search(id_list=["1605.08386v1"])
# Reuse client to fetch the paper, then print its title.
first_result = next(client.results(search_by_id))
print(first_result.title)

[!TIP] [arxivql](https://pypi.org/project/arxivql/) may simplify constructing complex query strings.

Fetching results with a custom client

import arxiv

big_slow_client = Client(
  page_size = 1000,
  delay_seconds = 10.0,
  num_retries = 5
)

# Prints 1000 titles before needing to make another request.
for result in big_slow_client.results(Search(query="quantum")):
  print(result.title)

Downloading a paper

import arxiv
from urllib.request import urlretrieve

paper = next(Client().results(Search(id_list=["1605.08386v1"])))

# Download the PDF.
urlretrieve(paper.pdf_url, "paper.pdf")

# Download the source tarball.
urlretrieve(paper.source_url(), "paper.tar.gz")

Logging

To inspect this package's network behavior and API logic, configure a DEBUG-level logger.

>>> import logging, arxiv
>>> logging.basicConfig(level=logging.DEBUG)
>>> client = Client()
>>> paper = next(client.results(Search(id_list=["1605.08386v1"])))
INFO:arxiv.arxiv:Requesting 100 results at offset 0
INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.org/api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979

Types

Client

A Client specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.

Clients configurations specify pagination and retry logic. Reusing a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.

A Search specifies a search of arXiv's database. Use Client.results to get a generator yielding Results.

Result

The Result objects yielded by Client.results include metadata about each paper.

The meaning of the underlying raw data is documented in the arXiv API User Manual: Details of Atom Results Returned.

Development

This project uses UV for development, while maintaining compatibility with traditional pip installation for end users.

  1""".. include:: ../README.md"""
  2
  3from __future__ import annotations
  4
  5import logging
  6import time
  7import itertools
  8import requests
  9
 10from importlib.metadata import PackageNotFoundError, version
 11from urllib.parse import urlencode
 12from datetime import datetime, timedelta, timezone
 13from calendar import timegm
 14
 15from enum import Enum
 16from typing import Generator, Iterator
 17
 18from . import _feed
 19from ._feed import ParsedFeed
 20
 21
 22logger = logging.getLogger(__name__)
 23
 24try:
 25    __version__ = version("arxiv")
 26except PackageNotFoundError:
 27    __version__ = "0.0.0+unknown"
 28
 29_USER_AGENT = f"arxiv.py/{__version__}"
 30
 31_DEFAULT_TIME = datetime.min
 32
 33
 34class Result:
 35    """
 36    An entry in an arXiv query results feed.
 37
 38    See [the arXiv API User's Manual: Details of Atom Results
 39    Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
 40    """
 41
 42    entry_id: str
 43    """A url of the form `https://arxiv.org/abs/{id}`."""
 44    updated: datetime
 45    """When the result was last updated."""
 46    published: datetime
 47    """When the result was originally published."""
 48    title: str
 49    """The title of the result."""
 50    authors: list[Result.Author]
 51    """The result's authors, including any `<arxiv:affiliation>` data."""
 52    summary: str
 53    """The result abstract."""
 54    comment: str | None
 55    """The authors' comment if present."""
 56    journal_ref: str | None
 57    """A journal reference if present."""
 58    doi: str | None
 59    """A URL for the resolved DOI to an external resource if present."""
 60    primary_category: str
 61    """
 62    The result's primary arXiv category. See [arXiv: Category
 63    Taxonomy](https://arxiv.org/category_taxonomy).
 64    """
 65    categories: list[str]
 66    """
 67    All of the result's categories. See [arXiv: Category
 68    Taxonomy](https://arxiv.org/category_taxonomy).
 69    """
 70    links: list[Result.Link]
 71    """Up to three URLs associated with this result."""
 72    pdf_url: str | None
 73    """The URL of a PDF version of this result if present among links."""
 74
 75    def __init__(
 76        self,
 77        entry_id: str,
 78        updated: datetime = _DEFAULT_TIME,
 79        published: datetime = _DEFAULT_TIME,
 80        title: str = "",
 81        authors: list[Result.Author] | None = None,
 82        summary: str = "",
 83        comment: str = "",
 84        journal_ref: str = "",
 85        doi: str = "",
 86        primary_category: str = "",
 87        categories: list[str] | None = None,
 88        links: list[Result.Link] | None = None,
 89    ):
 90        """
 91        Constructs an arXiv search result item.
 92
 93        In most cases, results are produced by `Client.results`, which parses
 94        API responses internally.
 95        """
 96        self.entry_id = entry_id
 97        self.updated = updated
 98        self.published = published
 99        self.title = title
100        self.authors = authors or []
101        self.summary = summary
102        self.comment = comment
103        self.journal_ref = journal_ref
104        self.doi = doi
105        self.primary_category = primary_category
106        self.categories = categories or []
107        self.links = links or []
108        # Calculated members
109        self.pdf_url = Result._get_pdf_url(self.links)
110
111    def __str__(self) -> str:
112        return self.entry_id
113
114    def __repr__(self) -> str:
115        return (
116            "{}(entry_id={}, updated={}, published={}, title={}, authors={}, "
117            "summary={}, comment={}, journal_ref={}, doi={}, "
118            "primary_category={}, categories={}, links={})"
119        ).format(
120            _classname(self),
121            repr(self.entry_id),
122            repr(self.updated),
123            repr(self.published),
124            repr(self.title),
125            repr(self.authors),
126            repr(self.summary),
127            repr(self.comment),
128            repr(self.journal_ref),
129            repr(self.doi),
130            repr(self.primary_category),
131            repr(self.categories),
132            repr(self.links),
133        )
134
135    def __eq__(self, other: object) -> bool:
136        if isinstance(other, Result):
137            return self.entry_id == other.entry_id
138        return False
139
140    def get_short_id(self) -> str:
141        """
142        Returns the short ID for this result.
143
144        + If the result URL is `"https://arxiv.org/abs/2107.05580v1"`,
145        `result.get_short_id()` returns `2107.05580v1`.
146
147        + If the result URL is `"https://arxiv.org/abs/quant-ph/0201082v1"`,
148        `result.get_short_id()` returns `"quant-ph/0201082v1"` (the pre-March
149        2007 arXiv identifier format).
150
151        For an explanation of the difference between arXiv's legacy and current
152        identifiers, see [Understanding the arXiv
153        identifier](https://arxiv.org/help/arxiv_identifier).
154        """
155        return self.entry_id.split("arxiv.org/abs/")[-1]
156
157    def source_url(self) -> str | None:
158        """
159        Derives a URL for the source tarfile for this result.
160        """
161        if self.pdf_url is None:
162            return None
163        return self.pdf_url.replace("/pdf/", "/src/")
164
165    @staticmethod
166    def _get_pdf_url(links: list[Result.Link]) -> str | None:
167        """
168        Finds the PDF link among a result's links and returns its URL.
169
170        Should only be called once for a given `Result`, in its constructor.
171        After construction, the URL should be available in `Result.pdf_url`.
172        """
173        pdf_urls = [link.href for link in links if link.title == "pdf"]
174        if len(pdf_urls) == 0:
175            return None
176        elif len(pdf_urls) > 1:
177            logger.warning("Result has multiple PDF links; using %s", pdf_urls[0])
178        return pdf_urls[0]
179
180    @staticmethod
181    def _to_datetime(ts: time.struct_time) -> datetime:
182        """
183        Converts a UTC `time.struct_time` into a time-zone-aware `datetime`.
184
185        Retained as a stable utility for callers that historically relied on
186        feedparser's `*_parsed` time tuples; the internal Atom parser produces
187        `datetime` objects directly.
188        """
189        return datetime.fromtimestamp(timegm(ts), tz=timezone.utc)
190
191    class Author:
192        """
193        A light inner class for representing a result's authors.
194        """
195
196        name: str
197        """The author's name."""
198        affiliation: list[str]
199        """
200        Any `<arxiv:affiliation>` values associated with this author. Most
201        results have no affiliation data and this is an empty list; some
202        results have one or more affiliation strings per author.
203
204        See https://github.com/lukasschwab/arxiv.py/issues/62.
205        """
206
207        def __init__(self, name: str, affiliation: list[str] | None = None):
208            """
209            Constructs an `Author` with the specified name and (optional)
210            affiliations.
211            """
212            self.name = name
213            self.affiliation = affiliation or []
214
215        def __str__(self) -> str:
216            return self.name
217
218        def __repr__(self) -> str:
219            if self.affiliation:
220                return "{}({}, affiliation={})".format(
221                    _classname(self), repr(self.name), repr(self.affiliation)
222                )
223            return "{}({})".format(_classname(self), repr(self.name))
224
225        def __eq__(self, other: object) -> bool:
226            if isinstance(other, Result.Author):
227                return self.name == other.name
228            return False
229
230    class Link:
231        """
232        A light inner class for representing a result's links.
233        """
234
235        href: str
236        """The link's `href` attribute."""
237        title: str | None
238        """The link's title."""
239        rel: str
240        """The link's relationship to the `Result`."""
241        content_type: str | None
242        """The link's HTTP content type."""
243
244        def __init__(
245            self,
246            href: str,
247            title: str | None = None,
248            rel: str = "",
249            content_type: str | None = None,
250        ):
251            """
252            Constructs a `Link` with the specified link metadata.
253            """
254            self.href = href
255            self.title = title
256            self.rel = rel
257            self.content_type = content_type
258
259        def __str__(self) -> str:
260            return self.href
261
262        def __repr__(self) -> str:
263            return "{}({}, title={}, rel={}, content_type={})".format(
264                _classname(self),
265                repr(self.href),
266                repr(self.title),
267                repr(self.rel),
268                repr(self.content_type),
269            )
270
271        def __eq__(self, other: object) -> bool:
272            if isinstance(other, Result.Link):
273                return self.href == other.href
274            return False
275
276    class MissingFieldError(Exception):
277        """
278        An error indicating an entry is unparseable because it lacks required
279        fields.
280        """
281
282        missing_field: str
283        """The required field missing from the would-be entry."""
284        message: str
285        """Message describing what caused this error."""
286
287        def __init__(self, missing_field: str):
288            self.missing_field = missing_field
289            self.message = "Entry from arXiv missing required info"
290
291        def __repr__(self) -> str:
292            return "{}({})".format(_classname(self), repr(self.missing_field))
293
294
295class SortCriterion(Enum):
296    """
297    A SortCriterion identifies a property by which search results can be
298    sorted.
299
300    See [the arXiv API User's Manual: sort order for return
301    results](https://arxiv.org/help/api/user-manual#sort).
302    """
303
304    Relevance = "relevance"
305    LastUpdatedDate = "lastUpdatedDate"
306    SubmittedDate = "submittedDate"
307
308
309class SortOrder(Enum):
310    """
311    A SortOrder indicates order in which search results are sorted according
312    to the specified arxiv.SortCriterion.
313
314    See [the arXiv API User's Manual: sort order for return
315    results](https://arxiv.org/help/api/user-manual#sort).
316    """
317
318    Ascending = "ascending"
319    Descending = "descending"
320
321
322class Search:
323    """
324    A specification for a search of arXiv's database.
325
326    To run a search, use `Search.run` to use a default client or `Client.run`
327    with a specific client.
328    """
329
330    query: str
331    """
332    A query string.
333
334    This should be unencoded. Use `au:del_maestro AND ti:checkerboard`, not
335    `au:del_maestro+AND+ti:checkerboard`.
336
337    See [the arXiv API User's Manual: Details of Query
338    Construction](https://arxiv.org/help/api/user-manual#query_details).
339    """
340    id_list: list[str]
341    """
342    A list of arXiv article IDs to which to limit the search.
343
344    See [the arXiv API User's
345    Manual](https://arxiv.org/help/api/user-manual#search_query_and_id_list)
346    for documentation of the interaction between `query` and `id_list`.
347    """
348    max_results: int | None
349    """
350    The maximum number of results to be returned in an execution of this
351    search. To fetch every result available, set `max_results=None`.
352
353    The API's limit is 300,000 results per query.
354    """
355    sort_by: SortCriterion
356    """The sort criterion for results."""
357    sort_order: SortOrder
358    """The sort order for results."""
359
360    def __init__(
361        self,
362        query: str = "",
363        id_list: list[str] | None = None,
364        max_results: int | None = 100,
365        sort_by: SortCriterion = SortCriterion.Relevance,
366        sort_order: SortOrder = SortOrder.Descending,
367    ):
368        """
369        Constructs an arXiv API search with the specified criteria.
370        """
371        self.query = query
372        self.id_list = id_list or []
373        self.max_results = max_results
374        self.sort_by = sort_by
375        self.sort_order = sort_order
376
377    def __str__(self) -> str:
378        if self.query and self.id_list:
379            return f"Search(query='{self.query}', id_list={len(self.id_list)} items)"
380        elif self.query:
381            return f"Search(query='{self.query}')"
382        elif self.id_list:
383            return f"Search(id_list={len(self.id_list)} items)"
384        else:
385            return "Search(empty)"
386
387    def __repr__(self) -> str:
388        return ("{}(query={}, id_list={}, max_results={}, sort_by={}, sort_order={})").format(
389            _classname(self),
390            repr(self.query),
391            repr(self.id_list),
392            repr(self.max_results),
393            repr(self.sort_by),
394            repr(self.sort_order),
395        )
396
397    def _url_args(self) -> dict[str, str]:
398        """
399        Returns a dict of search parameters that should be included in an API
400        request for this search.
401        """
402        return {
403            "search_query": self.query,
404            "id_list": ",".join(self.id_list),
405            "sortBy": self.sort_by.value,
406            "sortOrder": self.sort_order.value,
407        }
408
409
410class Client:
411    """
412    Specifies a strategy for fetching results from arXiv's API.
413
414    This class obscures pagination and retry logic, and exposes
415    `Client.results`.
416    """
417
418    query_url_format = "https://export.arxiv.org/api/query?{}"
419    """
420    The arXiv query API endpoint format.
421    """
422    page_size: int
423    """
424    Maximum number of results fetched in a single API request. Smaller pages can
425    be retrieved faster, but may require more round-trips.
426
427    The API's limit is 2000 results per page.
428    """
429    delay_seconds: float
430    """
431    Number of seconds to wait between API requests.
432
433    [arXiv's Terms of Use](https://arxiv.org/help/api/tou) ask that you "make no
434    more than one request every three seconds."
435    """
436    num_retries: int
437    """
438    Number of times to retry a failing API request before raising an Exception.
439    """
440
441    _last_request_dt: datetime | None
442    _session: requests.Session
443
444    def __init__(self, page_size: int = 100, delay_seconds: float = 3.0, num_retries: int = 3):
445        """
446        Constructs an arXiv API client with the specified options.
447
448        Note: the default parameters should provide a robust request strategy
449        for most use cases. Extreme page sizes, delays, or retries risk
450        violating the arXiv [API Terms of Use](https://arxiv.org/help/api/tou),
451        brittle behavior, and inconsistent results.
452        """
453        self.page_size = page_size
454        self.delay_seconds = delay_seconds
455        self.num_retries = num_retries
456        self._last_request_dt = None
457        self._session = requests.Session()
458
459    def __str__(self) -> str:
460        return f"Client(page_size={self.page_size}, delay={self.delay_seconds}s, retries={self.num_retries})"
461
462    def __repr__(self) -> str:
463        return "{}(page_size={}, delay_seconds={}, num_retries={})".format(
464            _classname(self),
465            repr(self.page_size),
466            repr(self.delay_seconds),
467            repr(self.num_retries),
468        )
469
470    def results(self, search: Search, offset: int = 0) -> Iterator[Result]:
471        """
472        Uses this client configuration to fetch one page of the search results
473        at a time, yielding the parsed `Result`s, until `max_results` results
474        have been yielded or there are no more search results.
475
476        If all tries fail, raises an `UnexpectedEmptyPageError` or `HTTPError`.
477
478        Setting a nonzero `offset` discards leading records in the result set.
479        When `offset` is greater than or equal to `search.max_results`, the full
480        result set is discarded.
481
482        For more on using generators, see
483        [Generators](https://wiki.python.org/moin/Generators).
484        """
485        limit = search.max_results - offset if search.max_results else None
486        if limit and limit < 0:
487            return iter(())
488        return itertools.islice(self._results(search, offset), limit)
489
490    def _results(self, search: Search, offset: int = 0) -> Generator[Result, None, None]:
491        page_url = self._format_url(search, offset, self.page_size)
492        feed = self._parse_feed(page_url, first_page=True)
493        if not feed.results:
494            logger.info("Got empty first page; stopping generation")
495            return
496        total_results = feed.header.total_results
497        logger.info(
498            "Got first page: %d of %d total results",
499            len(feed.results),
500            total_results,
501        )
502
503        while feed.results:
504            yield from feed.results
505            offset += len(feed.results)
506            if offset >= total_results:
507                break
508            page_url = self._format_url(search, offset, self.page_size)
509            feed = self._parse_feed(page_url, first_page=False)
510
511    def _format_url(self, search: Search, start: int, page_size: int) -> str:
512        """
513        Construct a request API for search that returns up to `page_size`
514        results starting with the result at index `start`.
515        """
516        url_args = search._url_args()
517        url_args.update(
518            {
519                "start": str(start),
520                "max_results": str(page_size),
521            }
522        )
523        return self.query_url_format.format(urlencode(url_args))
524
525    def _parse_feed(self, url: str, first_page: bool = True, _try_index: int = 0) -> ParsedFeed:
526        """
527        Fetches the specified URL and parses it as an Atom feed.
528
529        If a request fails or is unexpectedly empty, retries the request up to
530        `self.num_retries` times.
531        """
532        try:
533            return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
534        except (
535            HTTPError,
536            UnexpectedEmptyPageError,
537            requests.exceptions.ConnectionError,
538        ) as err:
539            if _try_index < self.num_retries:
540                logger.debug("Got error (try %d): %s", _try_index, err)
541                return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
542            logger.debug("Giving up (try %d): %s", _try_index, err)
543            raise err
544
545    def __try_parse_feed(
546        self,
547        url: str,
548        first_page: bool,
549        try_index: int,
550    ) -> ParsedFeed:
551        """
552        Recursive helper for _parse_feed. Enforces `self.delay_seconds`: if that
553        number of seconds has not passed since `_parse_feed` was last called,
554        sleeps until delay_seconds seconds have passed.
555        """
556        # If this call would violate the rate limit, sleep until it doesn't.
557        if self._last_request_dt is not None:
558            required = timedelta(seconds=self.delay_seconds)
559            since_last_request = datetime.now() - self._last_request_dt
560            if since_last_request < required:
561                to_sleep = (required - since_last_request).total_seconds()
562                logger.info("Sleeping: %f seconds", to_sleep)
563                time.sleep(to_sleep)
564
565        logger.info("Requesting page (first: %r, try: %d): %s", first_page, try_index, url)
566
567        resp = self._session.get(url, headers={"user-agent": _USER_AGENT})
568        self._last_request_dt = datetime.now()
569        if resp.status_code != requests.codes.OK:
570            raise HTTPError(url, try_index, resp.status_code)
571
572        feed = _feed.parse(resp.content)
573        if len(feed.results) == 0 and not first_page:
574            raise UnexpectedEmptyPageError(url, try_index, feed)
575
576        if feed.malformed:
577            logger.warning("Malformed feed; consider handling: %s", feed.error)
578
579        return feed
580
581
582class ArxivError(Exception):
583    """This package's base Exception class."""
584
585    url: str
586    """The feed URL that could not be fetched."""
587    retry: int
588    """
589    The request try number which encountered this error; 0 for the initial try,
590    1 for the first retry, and so on.
591    """
592    message: str
593    """Message describing what caused this error."""
594
595    def __init__(self, url: str, retry: int, message: str):
596        """
597        Constructs an `ArxivError` encountered while fetching the specified URL.
598        """
599        self.url = url
600        self.retry = retry
601        self.message = message
602        super().__init__(self.message)
603
604    def __reduce__(self) -> tuple:
605        return (self.__class__, (self.url, self.retry, self.message))
606
607    def __str__(self) -> str:
608        return "{} ({})".format(self.message, self.url)
609
610
611class UnexpectedEmptyPageError(ArxivError):
612    """
613    An error raised when a page of results that should be non-empty is empty.
614
615    This should never happen in theory, but happens sporadically due to
616    brittleness in the underlying arXiv API; usually resolved by retries.
617
618    See `Client.results` for usage.
619    """
620
621    raw_feed: ParsedFeed
622    """
623    The raw parsed feed. Sometimes this contains useful diagnostic information,
624    e.g. in `bozo_exception`.
625    """
626
627    def __init__(self, url: str, retry: int, raw_feed: ParsedFeed):
628        """
629        Constructs an `UnexpectedEmptyPageError` encountered for the specified
630        API URL after `retry` tries.
631        """
632        self.url = url
633        self.raw_feed = raw_feed
634        super().__init__(url, retry, "Page of results was unexpectedly empty")
635
636    def __reduce__(self) -> tuple:
637        return (self.__class__, (self.url, self.retry, self.raw_feed))
638
639    def __repr__(self) -> str:
640        return "{}({}, {}, {})".format(
641            _classname(self), repr(self.url), repr(self.retry), repr(self.raw_feed)
642        )
643
644
645class HTTPError(ArxivError):
646    """
647    A non-200 status encountered while fetching a page of results.
648
649    See `Client.results` for usage.
650    """
651
652    status: int
653    """The HTTP status reported by the underlying request."""
654
655    def __init__(self, url: str, retry: int, status: int):
656        """
657        Constructs an `HTTPError` for the specified status code, encountered for
658        the specified API URL after `retry` tries.
659        """
660        self.url = url
661        self.status = status
662        super().__init__(
663            url,
664            retry,
665            "Page request resulted in HTTP {}".format(self.status),
666        )
667
668    def __reduce__(self) -> tuple:
669        return (self.__class__, (self.url, self.retry, self.status))
670
671    def __repr__(self) -> str:
672        return "{}({}, {}, {})".format(
673            _classname(self), repr(self.url), repr(self.retry), repr(self.status)
674        )
675
676
677def _classname(o: object) -> str:
678    """A helper function for use in __repr__ methods: arxiv.Result.Link."""
679    return "arxiv.{}".format(o.__class__.__qualname__)
logger = <Logger arxiv (WARNING)>
class Result:
 35class Result:
 36    """
 37    An entry in an arXiv query results feed.
 38
 39    See [the arXiv API User's Manual: Details of Atom Results
 40    Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
 41    """
 42
 43    entry_id: str
 44    """A url of the form `https://arxiv.org/abs/{id}`."""
 45    updated: datetime
 46    """When the result was last updated."""
 47    published: datetime
 48    """When the result was originally published."""
 49    title: str
 50    """The title of the result."""
 51    authors: list[Result.Author]
 52    """The result's authors, including any `<arxiv:affiliation>` data."""
 53    summary: str
 54    """The result abstract."""
 55    comment: str | None
 56    """The authors' comment if present."""
 57    journal_ref: str | None
 58    """A journal reference if present."""
 59    doi: str | None
 60    """A URL for the resolved DOI to an external resource if present."""
 61    primary_category: str
 62    """
 63    The result's primary arXiv category. See [arXiv: Category
 64    Taxonomy](https://arxiv.org/category_taxonomy).
 65    """
 66    categories: list[str]
 67    """
 68    All of the result's categories. See [arXiv: Category
 69    Taxonomy](https://arxiv.org/category_taxonomy).
 70    """
 71    links: list[Result.Link]
 72    """Up to three URLs associated with this result."""
 73    pdf_url: str | None
 74    """The URL of a PDF version of this result if present among links."""
 75
 76    def __init__(
 77        self,
 78        entry_id: str,
 79        updated: datetime = _DEFAULT_TIME,
 80        published: datetime = _DEFAULT_TIME,
 81        title: str = "",
 82        authors: list[Result.Author] | None = None,
 83        summary: str = "",
 84        comment: str = "",
 85        journal_ref: str = "",
 86        doi: str = "",
 87        primary_category: str = "",
 88        categories: list[str] | None = None,
 89        links: list[Result.Link] | None = None,
 90    ):
 91        """
 92        Constructs an arXiv search result item.
 93
 94        In most cases, results are produced by `Client.results`, which parses
 95        API responses internally.
 96        """
 97        self.entry_id = entry_id
 98        self.updated = updated
 99        self.published = published
100        self.title = title
101        self.authors = authors or []
102        self.summary = summary
103        self.comment = comment
104        self.journal_ref = journal_ref
105        self.doi = doi
106        self.primary_category = primary_category
107        self.categories = categories or []
108        self.links = links or []
109        # Calculated members
110        self.pdf_url = Result._get_pdf_url(self.links)
111
112    def __str__(self) -> str:
113        return self.entry_id
114
115    def __repr__(self) -> str:
116        return (
117            "{}(entry_id={}, updated={}, published={}, title={}, authors={}, "
118            "summary={}, comment={}, journal_ref={}, doi={}, "
119            "primary_category={}, categories={}, links={})"
120        ).format(
121            _classname(self),
122            repr(self.entry_id),
123            repr(self.updated),
124            repr(self.published),
125            repr(self.title),
126            repr(self.authors),
127            repr(self.summary),
128            repr(self.comment),
129            repr(self.journal_ref),
130            repr(self.doi),
131            repr(self.primary_category),
132            repr(self.categories),
133            repr(self.links),
134        )
135
136    def __eq__(self, other: object) -> bool:
137        if isinstance(other, Result):
138            return self.entry_id == other.entry_id
139        return False
140
141    def get_short_id(self) -> str:
142        """
143        Returns the short ID for this result.
144
145        + If the result URL is `"https://arxiv.org/abs/2107.05580v1"`,
146        `result.get_short_id()` returns `2107.05580v1`.
147
148        + If the result URL is `"https://arxiv.org/abs/quant-ph/0201082v1"`,
149        `result.get_short_id()` returns `"quant-ph/0201082v1"` (the pre-March
150        2007 arXiv identifier format).
151
152        For an explanation of the difference between arXiv's legacy and current
153        identifiers, see [Understanding the arXiv
154        identifier](https://arxiv.org/help/arxiv_identifier).
155        """
156        return self.entry_id.split("arxiv.org/abs/")[-1]
157
158    def source_url(self) -> str | None:
159        """
160        Derives a URL for the source tarfile for this result.
161        """
162        if self.pdf_url is None:
163            return None
164        return self.pdf_url.replace("/pdf/", "/src/")
165
166    @staticmethod
167    def _get_pdf_url(links: list[Result.Link]) -> str | None:
168        """
169        Finds the PDF link among a result's links and returns its URL.
170
171        Should only be called once for a given `Result`, in its constructor.
172        After construction, the URL should be available in `Result.pdf_url`.
173        """
174        pdf_urls = [link.href for link in links if link.title == "pdf"]
175        if len(pdf_urls) == 0:
176            return None
177        elif len(pdf_urls) > 1:
178            logger.warning("Result has multiple PDF links; using %s", pdf_urls[0])
179        return pdf_urls[0]
180
181    @staticmethod
182    def _to_datetime(ts: time.struct_time) -> datetime:
183        """
184        Converts a UTC `time.struct_time` into a time-zone-aware `datetime`.
185
186        Retained as a stable utility for callers that historically relied on
187        feedparser's `*_parsed` time tuples; the internal Atom parser produces
188        `datetime` objects directly.
189        """
190        return datetime.fromtimestamp(timegm(ts), tz=timezone.utc)
191
192    class Author:
193        """
194        A light inner class for representing a result's authors.
195        """
196
197        name: str
198        """The author's name."""
199        affiliation: list[str]
200        """
201        Any `<arxiv:affiliation>` values associated with this author. Most
202        results have no affiliation data and this is an empty list; some
203        results have one or more affiliation strings per author.
204
205        See https://github.com/lukasschwab/arxiv.py/issues/62.
206        """
207
208        def __init__(self, name: str, affiliation: list[str] | None = None):
209            """
210            Constructs an `Author` with the specified name and (optional)
211            affiliations.
212            """
213            self.name = name
214            self.affiliation = affiliation or []
215
216        def __str__(self) -> str:
217            return self.name
218
219        def __repr__(self) -> str:
220            if self.affiliation:
221                return "{}({}, affiliation={})".format(
222                    _classname(self), repr(self.name), repr(self.affiliation)
223                )
224            return "{}({})".format(_classname(self), repr(self.name))
225
226        def __eq__(self, other: object) -> bool:
227            if isinstance(other, Result.Author):
228                return self.name == other.name
229            return False
230
231    class Link:
232        """
233        A light inner class for representing a result's links.
234        """
235
236        href: str
237        """The link's `href` attribute."""
238        title: str | None
239        """The link's title."""
240        rel: str
241        """The link's relationship to the `Result`."""
242        content_type: str | None
243        """The link's HTTP content type."""
244
245        def __init__(
246            self,
247            href: str,
248            title: str | None = None,
249            rel: str = "",
250            content_type: str | None = None,
251        ):
252            """
253            Constructs a `Link` with the specified link metadata.
254            """
255            self.href = href
256            self.title = title
257            self.rel = rel
258            self.content_type = content_type
259
260        def __str__(self) -> str:
261            return self.href
262
263        def __repr__(self) -> str:
264            return "{}({}, title={}, rel={}, content_type={})".format(
265                _classname(self),
266                repr(self.href),
267                repr(self.title),
268                repr(self.rel),
269                repr(self.content_type),
270            )
271
272        def __eq__(self, other: object) -> bool:
273            if isinstance(other, Result.Link):
274                return self.href == other.href
275            return False
276
277    class MissingFieldError(Exception):
278        """
279        An error indicating an entry is unparseable because it lacks required
280        fields.
281        """
282
283        missing_field: str
284        """The required field missing from the would-be entry."""
285        message: str
286        """Message describing what caused this error."""
287
288        def __init__(self, missing_field: str):
289            self.missing_field = missing_field
290            self.message = "Entry from arXiv missing required info"
291
292        def __repr__(self) -> str:
293            return "{}({})".format(_classname(self), repr(self.missing_field))

An entry in an arXiv query results feed.

See the arXiv API User's Manual: Details of Atom Results Returned.

Result( entry_id: str, updated: datetime.datetime = datetime.datetime(1, 1, 1, 0, 0), published: datetime.datetime = datetime.datetime(1, 1, 1, 0, 0), title: str = '', authors: list[Result.Author] | None = None, summary: str = '', comment: str = '', journal_ref: str = '', doi: str = '', primary_category: str = '', categories: list[str] | None = None, links: list[Result.Link] | None = None)
 76    def __init__(
 77        self,
 78        entry_id: str,
 79        updated: datetime = _DEFAULT_TIME,
 80        published: datetime = _DEFAULT_TIME,
 81        title: str = "",
 82        authors: list[Result.Author] | None = None,
 83        summary: str = "",
 84        comment: str = "",
 85        journal_ref: str = "",
 86        doi: str = "",
 87        primary_category: str = "",
 88        categories: list[str] | None = None,
 89        links: list[Result.Link] | None = None,
 90    ):
 91        """
 92        Constructs an arXiv search result item.
 93
 94        In most cases, results are produced by `Client.results`, which parses
 95        API responses internally.
 96        """
 97        self.entry_id = entry_id
 98        self.updated = updated
 99        self.published = published
100        self.title = title
101        self.authors = authors or []
102        self.summary = summary
103        self.comment = comment
104        self.journal_ref = journal_ref
105        self.doi = doi
106        self.primary_category = primary_category
107        self.categories = categories or []
108        self.links = links or []
109        # Calculated members
110        self.pdf_url = Result._get_pdf_url(self.links)

Constructs an arXiv search result item.

In most cases, results are produced by Client.results, which parses API responses internally.

entry_id: str

A url of the form https://arxiv.org/abs/{id}.

updated: datetime.datetime

When the result was last updated.

published: datetime.datetime

When the result was originally published.

title: str

The title of the result.

authors: list[Result.Author]

The result's authors, including any <arxiv:affiliation> data.

summary: str

The result abstract.

comment: str | None

The authors' comment if present.

journal_ref: str | None

A journal reference if present.

doi: str | None

A URL for the resolved DOI to an external resource if present.

primary_category: str

The result's primary arXiv category. See arXiv: Category Taxonomy.

categories: list[str]

All of the result's categories. See arXiv: Category Taxonomy.

pdf_url: str | None

The URL of a PDF version of this result if present among links.

def get_short_id(self) -> str:
141    def get_short_id(self) -> str:
142        """
143        Returns the short ID for this result.
144
145        + If the result URL is `"https://arxiv.org/abs/2107.05580v1"`,
146        `result.get_short_id()` returns `2107.05580v1`.
147
148        + If the result URL is `"https://arxiv.org/abs/quant-ph/0201082v1"`,
149        `result.get_short_id()` returns `"quant-ph/0201082v1"` (the pre-March
150        2007 arXiv identifier format).
151
152        For an explanation of the difference between arXiv's legacy and current
153        identifiers, see [Understanding the arXiv
154        identifier](https://arxiv.org/help/arxiv_identifier).
155        """
156        return self.entry_id.split("arxiv.org/abs/")[-1]

Returns the short ID for this result.

  • If the result URL is "https://arxiv.org/abs/2107.05580v1", result.get_short_id() returns 2107.05580v1.

  • If the result URL is "https://arxiv.org/abs/quant-ph/0201082v1", result.get_short_id() returns "quant-ph/0201082v1" (the pre-March 2007 arXiv identifier format).

For an explanation of the difference between arXiv's legacy and current identifiers, see Understanding the arXiv identifier.

def source_url(self) -> str | None:
158    def source_url(self) -> str | None:
159        """
160        Derives a URL for the source tarfile for this result.
161        """
162        if self.pdf_url is None:
163            return None
164        return self.pdf_url.replace("/pdf/", "/src/")

Derives a URL for the source tarfile for this result.

class Result.Author:
192    class Author:
193        """
194        A light inner class for representing a result's authors.
195        """
196
197        name: str
198        """The author's name."""
199        affiliation: list[str]
200        """
201        Any `<arxiv:affiliation>` values associated with this author. Most
202        results have no affiliation data and this is an empty list; some
203        results have one or more affiliation strings per author.
204
205        See https://github.com/lukasschwab/arxiv.py/issues/62.
206        """
207
208        def __init__(self, name: str, affiliation: list[str] | None = None):
209            """
210            Constructs an `Author` with the specified name and (optional)
211            affiliations.
212            """
213            self.name = name
214            self.affiliation = affiliation or []
215
216        def __str__(self) -> str:
217            return self.name
218
219        def __repr__(self) -> str:
220            if self.affiliation:
221                return "{}({}, affiliation={})".format(
222                    _classname(self), repr(self.name), repr(self.affiliation)
223                )
224            return "{}({})".format(_classname(self), repr(self.name))
225
226        def __eq__(self, other: object) -> bool:
227            if isinstance(other, Result.Author):
228                return self.name == other.name
229            return False

A light inner class for representing a result's authors.

Result.Author(name: str, affiliation: list[str] | None = None)
208        def __init__(self, name: str, affiliation: list[str] | None = None):
209            """
210            Constructs an `Author` with the specified name and (optional)
211            affiliations.
212            """
213            self.name = name
214            self.affiliation = affiliation or []

Constructs an Author with the specified name and (optional) affiliations.

name: str

The author's name.

affiliation: list[str]

Any <arxiv:affiliation> values associated with this author. Most results have no affiliation data and this is an empty list; some results have one or more affiliation strings per author.

See https://github.com/lukasschwab/arxiv.py/issues/62.

class Result.MissingFieldError(builtins.Exception):
277    class MissingFieldError(Exception):
278        """
279        An error indicating an entry is unparseable because it lacks required
280        fields.
281        """
282
283        missing_field: str
284        """The required field missing from the would-be entry."""
285        message: str
286        """Message describing what caused this error."""
287
288        def __init__(self, missing_field: str):
289            self.missing_field = missing_field
290            self.message = "Entry from arXiv missing required info"
291
292        def __repr__(self) -> str:
293            return "{}({})".format(_classname(self), repr(self.missing_field))

An error indicating an entry is unparseable because it lacks required fields.

Result.MissingFieldError(missing_field: str)
288        def __init__(self, missing_field: str):
289            self.missing_field = missing_field
290            self.message = "Entry from arXiv missing required info"
missing_field: str

The required field missing from the would-be entry.

message: str

Message describing what caused this error.

Inherited Members
builtins.BaseException
with_traceback
args
class SortCriterion(enum.Enum):
296class SortCriterion(Enum):
297    """
298    A SortCriterion identifies a property by which search results can be
299    sorted.
300
301    See [the arXiv API User's Manual: sort order for return
302    results](https://arxiv.org/help/api/user-manual#sort).
303    """
304
305    Relevance = "relevance"
306    LastUpdatedDate = "lastUpdatedDate"
307    SubmittedDate = "submittedDate"

A SortCriterion identifies a property by which search results can be sorted.

See the arXiv API User's Manual: sort order for return results.

Relevance = <SortCriterion.Relevance: 'relevance'>
LastUpdatedDate = <SortCriterion.LastUpdatedDate: 'lastUpdatedDate'>
SubmittedDate = <SortCriterion.SubmittedDate: 'submittedDate'>
Inherited Members
enum.Enum
name
value
class SortOrder(enum.Enum):
310class SortOrder(Enum):
311    """
312    A SortOrder indicates order in which search results are sorted according
313    to the specified arxiv.SortCriterion.
314
315    See [the arXiv API User's Manual: sort order for return
316    results](https://arxiv.org/help/api/user-manual#sort).
317    """
318
319    Ascending = "ascending"
320    Descending = "descending"

A SortOrder indicates order in which search results are sorted according to the specified SortCriterion.

See the arXiv API User's Manual: sort order for return results.

Ascending = <SortOrder.Ascending: 'ascending'>
Descending = <SortOrder.Descending: 'descending'>
Inherited Members
enum.Enum
name
value
class Client:
411class Client:
412    """
413    Specifies a strategy for fetching results from arXiv's API.
414
415    This class obscures pagination and retry logic, and exposes
416    `Client.results`.
417    """
418
419    query_url_format = "https://export.arxiv.org/api/query?{}"
420    """
421    The arXiv query API endpoint format.
422    """
423    page_size: int
424    """
425    Maximum number of results fetched in a single API request. Smaller pages can
426    be retrieved faster, but may require more round-trips.
427
428    The API's limit is 2000 results per page.
429    """
430    delay_seconds: float
431    """
432    Number of seconds to wait between API requests.
433
434    [arXiv's Terms of Use](https://arxiv.org/help/api/tou) ask that you "make no
435    more than one request every three seconds."
436    """
437    num_retries: int
438    """
439    Number of times to retry a failing API request before raising an Exception.
440    """
441
442    _last_request_dt: datetime | None
443    _session: requests.Session
444
445    def __init__(self, page_size: int = 100, delay_seconds: float = 3.0, num_retries: int = 3):
446        """
447        Constructs an arXiv API client with the specified options.
448
449        Note: the default parameters should provide a robust request strategy
450        for most use cases. Extreme page sizes, delays, or retries risk
451        violating the arXiv [API Terms of Use](https://arxiv.org/help/api/tou),
452        brittle behavior, and inconsistent results.
453        """
454        self.page_size = page_size
455        self.delay_seconds = delay_seconds
456        self.num_retries = num_retries
457        self._last_request_dt = None
458        self._session = requests.Session()
459
460    def __str__(self) -> str:
461        return f"Client(page_size={self.page_size}, delay={self.delay_seconds}s, retries={self.num_retries})"
462
463    def __repr__(self) -> str:
464        return "{}(page_size={}, delay_seconds={}, num_retries={})".format(
465            _classname(self),
466            repr(self.page_size),
467            repr(self.delay_seconds),
468            repr(self.num_retries),
469        )
470
471    def results(self, search: Search, offset: int = 0) -> Iterator[Result]:
472        """
473        Uses this client configuration to fetch one page of the search results
474        at a time, yielding the parsed `Result`s, until `max_results` results
475        have been yielded or there are no more search results.
476
477        If all tries fail, raises an `UnexpectedEmptyPageError` or `HTTPError`.
478
479        Setting a nonzero `offset` discards leading records in the result set.
480        When `offset` is greater than or equal to `search.max_results`, the full
481        result set is discarded.
482
483        For more on using generators, see
484        [Generators](https://wiki.python.org/moin/Generators).
485        """
486        limit = search.max_results - offset if search.max_results else None
487        if limit and limit < 0:
488            return iter(())
489        return itertools.islice(self._results(search, offset), limit)
490
491    def _results(self, search: Search, offset: int = 0) -> Generator[Result, None, None]:
492        page_url = self._format_url(search, offset, self.page_size)
493        feed = self._parse_feed(page_url, first_page=True)
494        if not feed.results:
495            logger.info("Got empty first page; stopping generation")
496            return
497        total_results = feed.header.total_results
498        logger.info(
499            "Got first page: %d of %d total results",
500            len(feed.results),
501            total_results,
502        )
503
504        while feed.results:
505            yield from feed.results
506            offset += len(feed.results)
507            if offset >= total_results:
508                break
509            page_url = self._format_url(search, offset, self.page_size)
510            feed = self._parse_feed(page_url, first_page=False)
511
512    def _format_url(self, search: Search, start: int, page_size: int) -> str:
513        """
514        Construct a request API for search that returns up to `page_size`
515        results starting with the result at index `start`.
516        """
517        url_args = search._url_args()
518        url_args.update(
519            {
520                "start": str(start),
521                "max_results": str(page_size),
522            }
523        )
524        return self.query_url_format.format(urlencode(url_args))
525
526    def _parse_feed(self, url: str, first_page: bool = True, _try_index: int = 0) -> ParsedFeed:
527        """
528        Fetches the specified URL and parses it as an Atom feed.
529
530        If a request fails or is unexpectedly empty, retries the request up to
531        `self.num_retries` times.
532        """
533        try:
534            return self.__try_parse_feed(url, first_page=first_page, try_index=_try_index)
535        except (
536            HTTPError,
537            UnexpectedEmptyPageError,
538            requests.exceptions.ConnectionError,
539        ) as err:
540            if _try_index < self.num_retries:
541                logger.debug("Got error (try %d): %s", _try_index, err)
542                return self._parse_feed(url, first_page=first_page, _try_index=_try_index + 1)
543            logger.debug("Giving up (try %d): %s", _try_index, err)
544            raise err
545
546    def __try_parse_feed(
547        self,
548        url: str,
549        first_page: bool,
550        try_index: int,
551    ) -> ParsedFeed:
552        """
553        Recursive helper for _parse_feed. Enforces `self.delay_seconds`: if that
554        number of seconds has not passed since `_parse_feed` was last called,
555        sleeps until delay_seconds seconds have passed.
556        """
557        # If this call would violate the rate limit, sleep until it doesn't.
558        if self._last_request_dt is not None:
559            required = timedelta(seconds=self.delay_seconds)
560            since_last_request = datetime.now() - self._last_request_dt
561            if since_last_request < required:
562                to_sleep = (required - since_last_request).total_seconds()
563                logger.info("Sleeping: %f seconds", to_sleep)
564                time.sleep(to_sleep)
565
566        logger.info("Requesting page (first: %r, try: %d): %s", first_page, try_index, url)
567
568        resp = self._session.get(url, headers={"user-agent": _USER_AGENT})
569        self._last_request_dt = datetime.now()
570        if resp.status_code != requests.codes.OK:
571            raise HTTPError(url, try_index, resp.status_code)
572
573        feed = _feed.parse(resp.content)
574        if len(feed.results) == 0 and not first_page:
575            raise UnexpectedEmptyPageError(url, try_index, feed)
576
577        if feed.malformed:
578            logger.warning("Malformed feed; consider handling: %s", feed.error)
579
580        return feed

Specifies a strategy for fetching results from arXiv's API.

This class obscures pagination and retry logic, and exposes Client.results.

Client( page_size: int = 100, delay_seconds: float = 3.0, num_retries: int = 3)
445    def __init__(self, page_size: int = 100, delay_seconds: float = 3.0, num_retries: int = 3):
446        """
447        Constructs an arXiv API client with the specified options.
448
449        Note: the default parameters should provide a robust request strategy
450        for most use cases. Extreme page sizes, delays, or retries risk
451        violating the arXiv [API Terms of Use](https://arxiv.org/help/api/tou),
452        brittle behavior, and inconsistent results.
453        """
454        self.page_size = page_size
455        self.delay_seconds = delay_seconds
456        self.num_retries = num_retries
457        self._last_request_dt = None
458        self._session = requests.Session()

Constructs an arXiv API client with the specified options.

Note: the default parameters should provide a robust request strategy for most use cases. Extreme page sizes, delays, or retries risk violating the arXiv API Terms of Use, brittle behavior, and inconsistent results.

query_url_format = 'https://export.arxiv.org/api/query?{}'

The arXiv query API endpoint format.

page_size: int

Maximum number of results fetched in a single API request. Smaller pages can be retrieved faster, but may require more round-trips.

The API's limit is 2000 results per page.

delay_seconds: float

Number of seconds to wait between API requests.

arXiv's Terms of Use ask that you "make no more than one request every three seconds."

num_retries: int

Number of times to retry a failing API request before raising an Exception.

def results(self, search: Search, offset: int = 0) -> Iterator[Result]:
471    def results(self, search: Search, offset: int = 0) -> Iterator[Result]:
472        """
473        Uses this client configuration to fetch one page of the search results
474        at a time, yielding the parsed `Result`s, until `max_results` results
475        have been yielded or there are no more search results.
476
477        If all tries fail, raises an `UnexpectedEmptyPageError` or `HTTPError`.
478
479        Setting a nonzero `offset` discards leading records in the result set.
480        When `offset` is greater than or equal to `search.max_results`, the full
481        result set is discarded.
482
483        For more on using generators, see
484        [Generators](https://wiki.python.org/moin/Generators).
485        """
486        limit = search.max_results - offset if search.max_results else None
487        if limit and limit < 0:
488            return iter(())
489        return itertools.islice(self._results(search, offset), limit)

Uses this client configuration to fetch one page of the search results at a time, yielding the parsed Results, until max_results results have been yielded or there are no more search results.

If all tries fail, raises an UnexpectedEmptyPageError or HTTPError.

Setting a nonzero offset discards leading records in the result set. When offset is greater than or equal to search.max_results, the full result set is discarded.

For more on using generators, see Generators.

class ArxivError(builtins.Exception):
583class ArxivError(Exception):
584    """This package's base Exception class."""
585
586    url: str
587    """The feed URL that could not be fetched."""
588    retry: int
589    """
590    The request try number which encountered this error; 0 for the initial try,
591    1 for the first retry, and so on.
592    """
593    message: str
594    """Message describing what caused this error."""
595
596    def __init__(self, url: str, retry: int, message: str):
597        """
598        Constructs an `ArxivError` encountered while fetching the specified URL.
599        """
600        self.url = url
601        self.retry = retry
602        self.message = message
603        super().__init__(self.message)
604
605    def __reduce__(self) -> tuple:
606        return (self.__class__, (self.url, self.retry, self.message))
607
608    def __str__(self) -> str:
609        return "{} ({})".format(self.message, self.url)

This package's base Exception class.

ArxivError(url: str, retry: int, message: str)
596    def __init__(self, url: str, retry: int, message: str):
597        """
598        Constructs an `ArxivError` encountered while fetching the specified URL.
599        """
600        self.url = url
601        self.retry = retry
602        self.message = message
603        super().__init__(self.message)

Constructs an ArxivError encountered while fetching the specified URL.

url: str

The feed URL that could not be fetched.

retry: int

The request try number which encountered this error; 0 for the initial try, 1 for the first retry, and so on.

message: str

Message describing what caused this error.

Inherited Members
builtins.BaseException
with_traceback
args
class UnexpectedEmptyPageError(ArxivError):
612class UnexpectedEmptyPageError(ArxivError):
613    """
614    An error raised when a page of results that should be non-empty is empty.
615
616    This should never happen in theory, but happens sporadically due to
617    brittleness in the underlying arXiv API; usually resolved by retries.
618
619    See `Client.results` for usage.
620    """
621
622    raw_feed: ParsedFeed
623    """
624    The raw parsed feed. Sometimes this contains useful diagnostic information,
625    e.g. in `bozo_exception`.
626    """
627
628    def __init__(self, url: str, retry: int, raw_feed: ParsedFeed):
629        """
630        Constructs an `UnexpectedEmptyPageError` encountered for the specified
631        API URL after `retry` tries.
632        """
633        self.url = url
634        self.raw_feed = raw_feed
635        super().__init__(url, retry, "Page of results was unexpectedly empty")
636
637    def __reduce__(self) -> tuple:
638        return (self.__class__, (self.url, self.retry, self.raw_feed))
639
640    def __repr__(self) -> str:
641        return "{}({}, {}, {})".format(
642            _classname(self), repr(self.url), repr(self.retry), repr(self.raw_feed)
643        )

An error raised when a page of results that should be non-empty is empty.

This should never happen in theory, but happens sporadically due to brittleness in the underlying arXiv API; usually resolved by retries.

See Client.results for usage.

UnexpectedEmptyPageError(url: str, retry: int, raw_feed: arxiv._feed.ParsedFeed)
628    def __init__(self, url: str, retry: int, raw_feed: ParsedFeed):
629        """
630        Constructs an `UnexpectedEmptyPageError` encountered for the specified
631        API URL after `retry` tries.
632        """
633        self.url = url
634        self.raw_feed = raw_feed
635        super().__init__(url, retry, "Page of results was unexpectedly empty")

Constructs an UnexpectedEmptyPageError encountered for the specified API URL after retry tries.

raw_feed: arxiv._feed.ParsedFeed

The raw parsed feed. Sometimes this contains useful diagnostic information, e.g. in bozo_exception.

url

The feed URL that could not be fetched.

Inherited Members
ArxivError
retry
message
builtins.BaseException
with_traceback
args
class HTTPError(ArxivError):
646class HTTPError(ArxivError):
647    """
648    A non-200 status encountered while fetching a page of results.
649
650    See `Client.results` for usage.
651    """
652
653    status: int
654    """The HTTP status reported by the underlying request."""
655
656    def __init__(self, url: str, retry: int, status: int):
657        """
658        Constructs an `HTTPError` for the specified status code, encountered for
659        the specified API URL after `retry` tries.
660        """
661        self.url = url
662        self.status = status
663        super().__init__(
664            url,
665            retry,
666            "Page request resulted in HTTP {}".format(self.status),
667        )
668
669    def __reduce__(self) -> tuple:
670        return (self.__class__, (self.url, self.retry, self.status))
671
672    def __repr__(self) -> str:
673        return "{}({}, {}, {})".format(
674            _classname(self), repr(self.url), repr(self.retry), repr(self.status)
675        )

A non-200 status encountered while fetching a page of results.

See Client.results for usage.

HTTPError(url: str, retry: int, status: int)
656    def __init__(self, url: str, retry: int, status: int):
657        """
658        Constructs an `HTTPError` for the specified status code, encountered for
659        the specified API URL after `retry` tries.
660        """
661        self.url = url
662        self.status = status
663        super().__init__(
664            url,
665            retry,
666            "Page request resulted in HTTP {}".format(self.status),
667        )

Constructs an HTTPError for the specified status code, encountered for the specified API URL after retry tries.

status: int

The HTTP status reported by the underlying request.

url

The feed URL that could not be fetched.

Inherited Members
ArxivError
retry
message
builtins.BaseException
with_traceback
args