X Tutup
Skip to content
This repository was archived by the owner on Dec 10, 2024. It is now read-only.

ghurault1/python-analysis-template

 
 

Repository files navigation

Python analysis template

Code style: black pre-commit License: MIT

This repository serves as a personal template for data science projects.

File structure

  • Analysis scripts and notebooks are located in analysis/.
  • Reusable functions and modules are stored in the local package src/.
    • The package can then be installed in development mode with pip install -e . for easy prototyping.
    • src/config.py is used to store variables, constants and configurations.
    • The package version is extracted from git tags using setuptools_scm following semantic versioning.
  • Tests for functions in src/ should go to tests/ and follow the convention test_*.py.

Moreover, I use the following the directories that are (usually) ignored by Git:

  • data/ to store data files.
  • results/ to store results/output files such as figures, output data, etc.

Development environment

I can set up the environment differently depending on the project. The irrelevant sections can be deleted when using the template.

Requirements

The following does not apply when managing requirements with conda, see the section below.

The requirements are specified in the following files:

  • requirements.in to specify direct dependencies.
  • requirements.txt to pin the dependencies (direct and indirect). This is the file used to recreate the environment from scratch using pip install -r requirements.txt.
  • pyproject.toml to store the direct dependencies of the src package.

The requirements.txt file should not be updated manually. Instead, I use pip-compile from pip-tools to generate requirements.txt.

Initial setup

  1. Start with an empty requirements.txt.
  2. Install pip-tools with pip install pip-tools.
  3. Compile requirements with pip-compile to generate a requirements.txt file.
  4. Install requirements with pip-sync (or pip install -r requirements.txt).

NB: the advantage of using pip-sync over pip install -r requirements.txt is that pip-sync will make sure the environment matches requirements.txt, i.e. removing packages in the environment but not in requirements.txt, if required.

Update the environment

  • To upgrade packages, run pip-compile --upgrade.
  • To add new packages, add packages in requirements.in and then compile requirements with pip-compile.

Then, the environment can be updated with pip-sync.

venv setup

To setup a Python virtual environment with venv called .venv, using the currently installed Python's version, navigate to the repository directory and run the following in the command line:

$ python -m venv .venv
$ source .venv/Scripts/activate

Conda setup

To set up the environment with conda (assuming it is already installed), navigate to the repository directory and run the following in the command line (specify the Python version and environment name as appropriate):

$ conda create -n myenv python=3.11
$ conda activate myenv
$ pip install -r requirements.in
$ pip install -e .

Then pin the requirements with:

$ conda env export > environment.yml

Finally, the environment can be recreated with:

$ conda create -n myenv -f environment.yml

VS Code Dev Containers (Docker)

A Docker container can be used as a development environment. In VS Code, this can be achieved using Dev Containers, which are configured in the .devcontainer directory. The environment is automatically built as follows:

  1. A Docker image of Python is created with packages installed from requirements.txt (except local packages). The Python's version can be edited in the Dockerfile.
  2. The image is ran in a container and the current directory is mounted.
  3. The local packages are installed in the container, along with some VS Code extensions.

To set up the dev container:

  1. Install and launch Docker.
  2. Open the container by using the command palette in VS Code (Ctrl + Shift + P) to search for "Dev Containers: Open Folder in Container...".

If needed, the container can be rebuilt by searching for "Dev Containers: Rebuild Container...".

Setup Git pre-commit hooks

Pre-commit hooks are configured using the pre-commit tool. Currently, the hooks consists in formatting with Black. When this repository is first initialised, the hooks need to be installed with pre-commit install.

Using the template

This section can be deleted when using the template.

Getting started

  1. Initialise your GitHub repository with this template. Alternatively, fork (or copy the content of) this repository.
  2. Update
    • the repository name
    • project information in pyproject.toml
    • the README
    • the license
  3. Set up your preferred development environment.
  4. Add a git tag for the inital version with git tag -a "v0.1.0" -m "Initial setup", and push it with git push origin --tags.

VS Code

I usually work with Visual Studio code, for which various settings are already predefined. In particular, I use the following extensions for Python development.

Possible extensions

The src/ package could contain the following modules or sub-packages depending on the project:

  • utils for utility functions.
  • data_processing for data processing functions (this could be imported as dp).
  • features: for extracting features.
  • models: for defining models.
  • evaluation: for evaluating performance.
  • plots: for plotting functions.

The repository structure could be extended with:

  • docs/ to store documentation, for example
    • A simple API documentation of the src package could be generated using pdoc, for example.
    • A full project documentation could be generated using mkdocs or quartodoc.
  • subfolders in data/ such as data/raw/ for storing raw data.
  • models/ to store model files.

Related

This template is inspired by the concept of a research compendium and similar projects I created for R projects (e.g. reproducible-workflow).

This template is relatively simple and tailored to my needs. More sophisticated templates are available elsewhere, such as:

As opposed to other templates, this template is more focused on experimentation rather than sharing a single final product.

About

Template repository for Python analytic projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 80.8%
  • Dockerfile 19.2%
X Tutup