Usage¶

Basics¶

Basic usage involves initializing the main class, calling one of its methods to retrieve data, and accessing the response attributes. For example, to get a paper by its ID:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.title)

Typed responses¶

The library offers typed responses. This simplifies data extraction and enhances code readability. For example, to access the title of a paper:

paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.title)

You can also access the API response in its original JSON format as a dictionary. To retrieve the raw JSON data, use the raw_data attribute of the response object:

paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.raw_data)

To explore all available fields in the response, use the keys() method:

paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.keys())

Asynchronous requests¶

The library supports both synchronous and asynchronous versions for its methods, allowing you to choose the approach that best suits your workflow.

You can use the asynchronous version with the AsyncSemanticScholar class:

import asyncio
from semanticscholar import AsyncSemanticScholar

def fetch_paper():
    async def get_paper():
        sch = AsyncSemanticScholar()
        return await sch.get_paper('10.1093/mind/lix.236.433')
    return asyncio.run(get_paper())

paper = fetch_paper()

Authenticated requests¶

If you have an API key, you can pass it as an argument to the main class. This will allow you to make authenticated requests.

from semanticscholar import SemanticScholar
sch = SemanticScholar(api_key='your_api_key_here')

Retry mode¶

The library provides an automatic retry mechanism to handle rate-limiting responses from the Semantic Scholar API.

By default, the retry mechanism is enabled (retry=True). When enabled, the library will automatically retry requests up to 10 times if it encounters an HTTP 429 status (Too Many Requests). Retries use exponential back-off, starting at 5 seconds and doubling up to a maximum of 60 seconds between attempts.

This feature is especially useful for handling temporary rate limits imposed by the Semantic Scholar API, ensuring your requests are eventually processed without manual intervention. If you prefer to manage retries yourself, you can disable this feature as shown below:

from semanticscholar import SemanticScholar
sch = SemanticScholar(retry=False)

Response timeout¶

You can set the wait time for a response. By default, requests to the API will wait for 30 seconds until a TimeoutException is raised. To change the default value, specify it during the creation of a SemanticScholar instance:

from semanticscholar import SemanticScholar
sch = SemanticScholar(timeout=5)

Alternatively, you can set the timeout property value:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
sch.timeout = 5

Paper and Author¶

Paper¶

To access paper data:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
paper = sch.get_paper('10.1093/mind/lix.236.433')

For details on supported ID types, refer to the official API documentation.

Autocomplete suggestions¶

Use the autocomplete feature to get suggestions for paper queries. For example:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
suggestions = sch.get_autocomplete('softw')

The response contains a list of suggestions based on the provided partial query. Each suggestion is represented by an Autocomplete object, which provides minimal information about the papers. Note that these are not full Paper objects with all attributes.

Snippet search¶

Search for text snippets matching a query. Snippets are excerpts of approximately 500 words drawn from a paper’s title, abstract, and body text:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_snippet('turing test', limit=5)
for item in results:
    print(item.paper.title)
    print(item.snippet.snippet_kind)
    print(item.text)

Each result is a Snippet object containing a relevance score, a paper with basic metadata (corpus ID, title, authors), and a snippet with the matched text, its kind (title, abstract, or body), section, offset, and annotations.

You can also filter results by paper IDs, authors, year, venue, fields of study, citation count, and publication date. See the method documentation for all available parameters.

Author¶

To access author data:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
author = sch.get_author(2262347)

Retrieve multiple items at once¶

You can fetch up to 1000 distinct papers or authors in one API call. To do that, provide a list of IDs (array of strings).

Get details for multiple papers:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
list_of_paper_ids = [
    'CorpusId:470667',
    '10.2139/ssrn.2250500',
    '0f40b1f08821e22e859c6050916cec3667778613'
]
results = sch.get_papers(list_of_paper_ids)

Get details for multiple authors:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
list_of_author_ids = ['3234559', '1726629', '1711844']
results = sch.get_authors(list_of_author_ids)

Search by keyword¶

To search for papers by keyword:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery and Intelligence')

Warning

From the official documentation: “Because of the subtleties of finding partial phrase matches in different parts of the document, be cautious about interpreting the total field as a count of documents containing any particular word in the query.”

To search for authors by name:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_author('Alan M. Turing')

Paper Bulk retrieval¶

The bulk retrieval method allows fetching up to 1,000 basic paper records per request and up 10,000,000 papers in total. This useful To retrieve a large number of papers, once search_paper() by default are limited to 1,000 results in total.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
response = sch.search_paper(query='deep learning', bulk=True)

The query supports advanced syntax for refined searches. For details about query syntax and additional parameters, refer to the official API documentation.

# Search for papers containing 'deep' or 'learning'
response = sch.search_paper(query='deep | learning', bulk=True)

Additionally, the sort parameter allows ordering results when using bulk=True. Use the format <field>:<order>, where: - field: Can be paperId, publicationDate, or citationCount. - order: Can be asc (ascending) or desc (descending).

By default, results are sorted by paperId:asc.

# Retrieve highly-cited papers first
response = sch.search_paper(query='deep learning', bulk=True, sort='citationCount:desc')

Search papers by title¶

Retrieve a single paper whose title best matches the given query.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
paper = sch.search_paper(query='deep learning', match_title=True)

Note

The match_title parameter is not compatible with the bulk parameter.

Query parameters for search papers¶

`year: str`¶

Restrict results to a specific publication year or a given range, following the patterns ‘{year}’ or ‘{start}-{end}’. Also you can omit the start or the end. Examples: ‘2000’, ‘1991-2000’, ‘1991-’, ‘-2000’.

results = sch.search_paper('turing test', year=2000)

`publication_type: list`¶

Restrict results to a given list of publication types. Check official documentation for a list of available publication types.

results = sch.search_paper('turing test', publication_type=['Journal','Conference'])

`open_access_pdf: bool`¶

Restrict results to papers with open access PDFs. By default, this parameter is set to False.

results = sch.search_paper('turing test', open_access_pdf=True)

`venue: list`¶

Restrict results to a given list of venues.

results = sch.search_paper('turing test', venue=['ESEM','ICSE','ICSME'])

`fields_of_study: list`¶

Restrict results to a given list of fields of study. Check official documentation for a list of available fields.

results = sch.search_paper('turing test', fields_of_study=['Computer Science','Education'])

`publication_date_or_year: str`¶

Restrict results to the given range of publication date in the format <start_date>:<end_date>, where dates are in the format YYYY-MM-DD, YYYY-MM, or YYYY.

results = sch.search_paper('turing test', publication_date_or_year='2020-01-01:2021-12-31')

`min_citation_count: int`¶

Restrict results to papers with at least the given number of citations.

results = sch.search_paper('turing test', min_citation_count=100)

Paginated results¶

Methods that return large amounts of data in chunks, such as searching for papers or authors, support pagination. These methods retrieve results up to a defined limit per page (default is 100). To access additional pages, you can fetch them individually or iterate through the entire set of results.

For example, iterating over all results for a paper search:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('Computing Machinery and Intelligence')
all_results = [item for item in results]

Pagination is handled automatically when iterating, retrieving all available items. However, if only the first batch of results is needed, you can access them directly using the items property of the result object, avoiding extra API calls:

results = sch.search_paper('Computing Machinery and Intelligence')
first_page = results.items

To fetch the next page of results, use the next_page() method. This method appends the next batch of items to the current list, as shown in the example below:

results = sch.search_paper('Computing Machinery and Intelligence')
results.next_page()
first_two_pages = results.items

Recommended papers¶

To get recommended papers for a given paper:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.get_recommended_papers('10.2139/ssrn.2250500')

To get recommended papers based on a list of positive and negative paper examples:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
positive_paper_ids = ['10.1145/3544585.3544600']
negative_paper_ids = ['10.1145/301250.301271']
results = sch.get_recommended_papers_from_lists(positive_paper_ids, negative_paper_ids)

You can also omit the list of negative paper IDs; in which case, the API will return recommended papers based on the list of positive paper IDs only.

Datasets API¶

The Datasets API includes several key concepts:

Releases: Datasets are organized into releases, which are snapshots of the data at specific points in time (e.g., ‘2023-12-01’).
Datasets: Each release contains multiple datasets, such as ‘papers’, ‘authors’, ‘publications’, etc.
Incremental Updates: For efficient updates, the API provides diffs between releases showing only the changes.

Available releases¶

The response contains a list of release identifiers (strings) that you can use to access specific releases. To get a list of all available dataset releases:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
releases = sch.get_available_releases()
print(releases)

# Example output:
# ['2025-08-19', '2025-09-05, ...']

Get a specific release¶

To get detailed information about a specific release, including all available datasets:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
release = sch.get_release('2025-08-19')

print(f"Release ID: {release.release_id}")
print(f"Number of datasets: {len(release.datasets)}")

# List all available datasets in this release
for dataset in release.datasets:
    print(f"- {dataset.name}: {dataset.description}")

Per the official documentation, you can also use ‘latest’ as the release identifier to get the most recent release.

Get dataset download links¶

This endpoint requires authentication with a valid API key. To get download links for a specific dataset in a release:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
dataset = sch.get_dataset_download_links('2025-08-19', 'papers')

print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
print(f"Number of files: {len(dataset.files)}")

# Print first few download URLs
for i, file_url in enumerate(dataset.files[:3]):
    print(f"File {i+1}: {file_url}")

Get dataset diffs¶

This endpoint requires authentication with a valid API key. To get file urls for incremental updates between two releases for a specific dataset:

from semanticscholar import SemanticScholar
sch = SemanticScholar()
diffs = sch.get_dataset_diffs('papers', '2023-12-01', '2024-01-01')

print(f"Number of incremental updates: {len(diffs.diffs)}")

# Examine the first diff
first_diff = diffs.diffs[0]
print(f"First update: {first_diff.from_release} -> {first_diff.to_release}")
print(f"Update files: {len(first_diff.update_files)}")
print(f"Delete files: {len(first_diff.delete_files)}")

Actually using the diffs to update your local dataset is not supported by this library. Please see the official documentation for an example using Spark.

Common query parameters¶

`fields: list`¶

The list of the fields to be returned. By default, the response includes all fields. As explained in official documentation, fields like papers (author lookup and search) may result in responses bigger than the usual size and affect performance. Consider reducing the list. Check official documentation for a list of available fields.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', fields=['title','year'])

`limit: int`¶

This parameter represents the maximum number of results to return on each call to API. According to official documentation, setting a smaller limit reduces output size and latency. The default value is 100.

from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', limit=5)

Troubleshooting¶

If you encounter issues while using the semanticscholar library, enabling debug-level logging can provide valuable insights into the underlying HTTP requests and responses. This can help you identify the root cause of the problem and resolve it more efficiently.

Enabling debug logging¶

You can enable debug-level logging globally or just for the semanticscholar library.

Enable debug logging globally:

import logging
logging.getLogger().setLevel(logging.DEBUG)

This will enable debug-level logging for all loggers, including the semanticscholar library, its dependencies, and any other libraries you are using. While these messages may not be directly related, they can still provide valuable context for identifying related issues or understanding broader behavior.

Enable debug logging for the semanticscholar library only:

import logging
logging.getLogger('semanticscholar').setLevel(logging.DEBUG)

This restricts debug-level logging to the semanticscholar library.

In both cases, the output will include detailed information about HTTP requests, headers, payloads, and the equivalent curl command. For example:

DEBUG:semanticscholar:HTTP Request: POST https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year
DEBUG:semanticscholar:Headers: {'x-api-key': 'F@k3K3y'}
DEBUG:semanticscholar:Payload: {'ids': ['CorpusId:470667', '10.2139/ssrn.2250500', '0f40b1f08821e22e859c6050916cec3667778613']}
DEBUG:semanticscholar:cURL command: curl -X POST -H 'x-api-key: F@k3K3y' -d '{"ids": ["CorpusId:470667", "10.2139/ssrn.2250500", "0f40b1f08821e22e859c6050916cec3667778613"]}' https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year

Warning

Be cautious when enabling debug logging and sharing the output, as it may contain sensitive information like API keys.

Debugging with the `curl` command¶

The semanticscholar library provides a curl command in its debug output. You can use this command to interact directly with the Semantic Scholar API and compare the results with those obtained through the library.

For example:

curl -X POST -H 'x-api-key: F@k3K3y' -d '{"ids": ["CorpusId:470667", "10.2139/ssrn.2250500", "0f40b1f08821e22e859c6050916cec3667778613"]}' https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year

You can also use any HTTP client of your choice (e.g., Postman) to replicate the request and validate the behavior.

By using debug logging and the provided curl command, you can isolate issues, verify API responses, and resolve problems effectively.

Usage¶

Basics¶

Typed responses¶

Asynchronous requests¶

Authenticated requests¶

Retry mode¶

Response timeout¶

Paper and Author¶

Paper¶

Autocomplete suggestions¶

Snippet search¶

Author¶

Retrieve multiple items at once¶

Search by keyword¶

Paper Bulk retrieval¶

Search papers by title¶

Query parameters for search papers¶

year: str¶

publication_type: list¶

open_access_pdf: bool¶

venue: list¶

fields_of_study: list¶

publication_date_or_year: str¶

min_citation_count: int¶

Paginated results¶

Recommended papers¶

Datasets API¶

Available releases¶

Get a specific release¶

Get dataset download links¶

Get dataset diffs¶

Common query parameters¶

fields: list¶

limit: int¶

Troubleshooting¶

Enabling debug logging¶

Debugging with the curl command¶

`year: str`¶

`publication_type: list`¶

`open_access_pdf: bool`¶

`venue: list`¶

`fields_of_study: list`¶

`publication_date_or_year: str`¶

`min_citation_count: int`¶

`fields: list`¶

`limit: int`¶

Debugging with the `curl` command¶