Contributing Guidelines

Thank you for your interest in contributing to the project! This document will help you get started. There are many ways to contribute:

  • Participate in discussions

  • Report issues or suggest improvements

  • Contribute with code to fix bugs or add features

  • Improve the documentation

Reporting issues

If you find a bug or have a suggestion for improvement, please open an issue. Before doing so, make sure to search for existing ones to avoid duplicates.

When opening a new issue, provide as much information as possible. The templates provided will guide you on what to include. In general, the more details you provide, the easier it will be to understand and address the problem.

Short, self-contained, correct examples are always appreciated. If you can provide a code snippet that reproduces the issue, it will be very helpful.

Debug information is also important. See the troubleshooting section in the docs to learn how to gather it. Be sure to exclude any sensitive data.

Contributing with code

All code contributions must be made through Pull Requests.

When contributing, please:

  • Follow the PEP8 style guide: Ensure your code is clean and readable by following the PEP8 style guide.

  • Include tests: Write the necessary tests to verify your code works. Ensure all tests pass before submitting a PR.

  • Avoid breaking changes: If necessary, mention them clearly in the PR description.

  • Update documentation: Update or add documentation for new or changed features.

  • Use clear commit messages: Although not mandatory, we encourage following the Conventional Commits specification to maintain a clean and organized commit history.

Architecture overview

Before contributing, it’s important to understand the project’s architecture:

  • Async-first: All API logic lives in AsyncSemanticScholar. The sync SemanticScholar class delegates every method to its async counterpart via _run_async().

  • Data models: All response objects inherit from SemanticScholarObject and follow the same pattern: a FIELDS class constant, attributes initialized to None in __init__, and a _init_attributes(data) method that populates them from a dict.

  • HTTP layer: ApiRequester is the single point for HTTP requests, retries, and error mapping. API methods should not make HTTP calls directly.

When adding new features, follow these patterns to keep the codebase consistent. If your contribution requires changes to the architecture, please discuss it in the PR description so it can be reviewed properly.

Running and writing tests

First, install the dependencies:

pip install -r test-requirements.txt

Then, run the tests:

  1. Run all tests:

    python -m unittest
    
  2. Run all tests in a specific class:

    python -m unittest tests.test_semanticscholar.TestClassName
    
  3. Run a specific test method:

    python -m unittest tests.test_semanticscholar.TestClassName.testMethod
    

Async tests

Async tests use IsolatedAsyncioTestCase and follow the same naming convention as sync tests, but with an _async suffix (e.g., test_get_paper_async). They reuse the same VCR cassettes as their sync counterparts via the use_shared_cassette decorator, which strips the _async suffix to find the matching cassette file.

Recording API Calls

This project uses VCR.py, a library designed to record HTTP interactions and replay them during tests, eliminating the need to make actual HTTP requests repeatedly.

When adding new tests, the first run will record the interactions and save them as a new file in the tests/data directory.

Code coverage

When adding new code, make sure to write tests that cover it. The goal is to maintain high code coverage.

To check the code coverage, run:

python -m coverage run --source=semanticscholar/ -m unittest discover

To generate a report, run:

python -m coverage report

If you want to see the coverage in HTML format, run:

python -m coverage html

Contributing to documentation

The documentation is built using Sphinx and hosted on Read the Docs.

To get started, install the necessary dependencies:

pip install -r docs-requirements.txt

After adding new content, you could build the documentation locally:

cd docs && make html

Then, open the docs/build/html/index.html file in your browser to see the changes.

Please do not include auto-generated files, such as those created by docs/source/conf.py, in your PRs.

To contribute to the documentation, it’s important to be familiar with reStructuredText and Sphinx. For more details on how to work with them, refer to the official Sphinx documentation.