Usage¶
Basics¶
Basic usage involves initializing the main class, calling one of its methods to retrieve data, and accessing the response attributes. For example, to get a paper by its ID:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.title)
Typed responses¶
The library offers typed responses. This simplifies data extraction and enhances code readability. For example, to access the title of a paper:
paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.title)
You can also access the API response in its original JSON format as a dictionary. To retrieve the raw JSON data, use the raw_data attribute of the response object:
paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.raw_data)
To explore all available fields in the response, use the keys() method:
paper = sch.get_paper('10.1093/mind/lix.236.433')
print(paper.keys())
See also
Refer to the SemanticScholar Objects section for details on all available response types and their attributes.
Asynchronous requests¶
The library supports both synchronous and asynchronous versions for its methods, allowing you to choose the approach that best suits your workflow.
You can use the asynchronous version with the AsyncSemanticScholar class:
import asyncio
from semanticscholar import AsyncSemanticScholar
def fetch_paper():
async def get_paper():
sch = AsyncSemanticScholar()
return await sch.get_paper('10.1093/mind/lix.236.433')
return asyncio.run(get_paper())
paper = fetch_paper()
Authenticated requests¶
If you have an API key, you can pass it as an argument to the main class. This will allow you to make authenticated requests.
from semanticscholar import SemanticScholar
sch = SemanticScholar(api_key='your_api_key_here')
Retry mode¶
The library provides an automatic retry mechanism to handle rate-limiting responses from the Semantic Scholar API.
By default, the retry mechanism is enabled (retry=True). When enabled, the library will automatically retry requests up to 10 times if it encounters an HTTP 429 status (Too Many Requests). Retries use exponential back-off, starting at 5 seconds and doubling up to a maximum of 60 seconds between attempts.
This feature is especially useful for handling temporary rate limits imposed by the Semantic Scholar API, ensuring your requests are eventually processed without manual intervention. If you prefer to manage retries yourself, you can disable this feature as shown below:
from semanticscholar import SemanticScholar
sch = SemanticScholar(retry=False)
Response timeout¶
You can set the wait time for a response. By default, requests to the API will wait for 30 seconds until a TimeoutException is raised. To change the default value, specify it during the creation of a SemanticScholar instance:
from semanticscholar import SemanticScholar
sch = SemanticScholar(timeout=5)
Alternatively, you can set the timeout property value:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
sch.timeout = 5
Recommended papers¶
To get recommended papers for a given paper:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.get_recommended_papers('10.2139/ssrn.2250500')
To get recommended papers based on a list of positive and negative paper examples:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
positive_paper_ids = ['10.1145/3544585.3544600']
negative_paper_ids = ['10.1145/301250.301271']
results = sch.get_recommended_papers_from_lists(positive_paper_ids, negative_paper_ids)
You can also omit the list of negative paper IDs; in which case, the API will return recommended papers based on the list of positive paper IDs only.
Datasets API¶
The Datasets API includes several key concepts:
Releases: Datasets are organized into releases, which are snapshots of the data at specific points in time (e.g., ‘2023-12-01’).
Datasets: Each release contains multiple datasets, such as ‘papers’, ‘authors’, ‘publications’, etc.
Incremental Updates: For efficient updates, the API provides diffs between releases showing only the changes.
Available releases¶
The response contains a list of release identifiers (strings) that you can use to access specific releases. To get a list of all available dataset releases:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
releases = sch.get_available_releases()
print(releases)
# Example output:
# ['2025-08-19', '2025-09-05, ...']
Get a specific release¶
To get detailed information about a specific release, including all available datasets:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
release = sch.get_release('2025-08-19')
print(f"Release ID: {release.release_id}")
print(f"Number of datasets: {len(release.datasets)}")
# List all available datasets in this release
for dataset in release.datasets:
print(f"- {dataset.name}: {dataset.description}")
Per the official documentation, you can also use ‘latest’ as the release identifier to get the most recent release.
Get dataset download links¶
This endpoint requires authentication with a valid API key. To get download links for a specific dataset in a release:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
dataset = sch.get_dataset_download_links('2025-08-19', 'papers')
print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
print(f"Number of files: {len(dataset.files)}")
# Print first few download URLs
for i, file_url in enumerate(dataset.files[:3]):
print(f"File {i+1}: {file_url}")
Get dataset diffs¶
This endpoint requires authentication with a valid API key. To get file urls for incremental updates between two releases for a specific dataset:
from semanticscholar import SemanticScholar
sch = SemanticScholar()
diffs = sch.get_dataset_diffs('papers', '2023-12-01', '2024-01-01')
print(f"Number of incremental updates: {len(diffs.diffs)}")
# Examine the first diff
first_diff = diffs.diffs[0]
print(f"First update: {first_diff.from_release} -> {first_diff.to_release}")
print(f"Update files: {len(first_diff.update_files)}")
print(f"Delete files: {len(first_diff.delete_files)}")
Actually using the diffs to update your local dataset is not supported by this library. Please see the official documentation for an example using Spark.
Common query parameters¶
fields: list¶
The list of the fields to be returned. By default, the response includes all fields. As explained in official documentation, fields like papers (author lookup and search) may result in responses bigger than the usual size and affect performance. Consider reducing the list. Check official documentation for a list of available fields.
from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', fields=['title','year'])
limit: int¶
This parameter represents the maximum number of results to return on each call to API. According to official documentation, setting a smaller limit reduces output size and latency. The default value is 100.
from semanticscholar import SemanticScholar
sch = SemanticScholar()
results = sch.search_paper('software engineering', limit=5)
Troubleshooting¶
If you encounter issues while using the semanticscholar library, enabling debug-level logging can provide valuable insights into the underlying HTTP requests and responses. This can help you identify the root cause of the problem and resolve it more efficiently.
Enabling debug logging¶
You can enable debug-level logging globally or just for the semanticscholar library.
Enable debug logging globally:
import logging
logging.getLogger().setLevel(logging.DEBUG)
This will enable debug-level logging for all loggers, including the semanticscholar library, its dependencies, and any other libraries you are using. While these messages may not be directly related, they can still provide valuable context for identifying related issues or understanding broader behavior.
Enable debug logging for the semanticscholar library only:
import logging
logging.getLogger('semanticscholar').setLevel(logging.DEBUG)
This restricts debug-level logging to the semanticscholar library.
In both cases, the output will include detailed information about HTTP requests, headers, payloads, and the equivalent curl command. For example:
DEBUG:semanticscholar:HTTP Request: POST https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year
DEBUG:semanticscholar:Headers: {'x-api-key': 'F@k3K3y'}
DEBUG:semanticscholar:Payload: {'ids': ['CorpusId:470667', '10.2139/ssrn.2250500', '0f40b1f08821e22e859c6050916cec3667778613']}
DEBUG:semanticscholar:cURL command: curl -X POST -H 'x-api-key: F@k3K3y' -d '{"ids": ["CorpusId:470667", "10.2139/ssrn.2250500", "0f40b1f08821e22e859c6050916cec3667778613"]}' https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year
Warning
Be cautious when enabling debug logging and sharing the output, as it may contain sensitive information like API keys.
Debugging with the curl command¶
The semanticscholar library provides a curl command in its debug output. You can use this command to interact directly with the Semantic Scholar API and compare the results with those obtained through the library.
For example:
curl -X POST -H 'x-api-key: F@k3K3y' -d '{"ids": ["CorpusId:470667", "10.2139/ssrn.2250500", "0f40b1f08821e22e859c6050916cec3667778613"]}' https://api.semanticscholar.org/graph/v1/paper/batch?fields=title,year
You can also use any HTTP client of your choice (e.g., Postman) to replicate the request and validate the behavior.
By using debug logging and the provided curl command, you can isolate issues, verify API responses, and resolve problems effectively.