Python analysis template

This repository serves as a personal template for data science projects.

File structure

Analysis scripts and notebooks are located in analysis/.
Reusable functions and modules are stored in the local package src/.
- The package can then be installed in development mode with pip install -e . for easy prototyping.
- src/config.py is used to store variables, constants and configurations.
- The package version is extracted from git tags using setuptools_scm following semantic versioning.
Tests for functions in src/ should go to tests/ and follow the convention test_*.py.

Moreover, I use the following the directories that are (usually) ignored by Git:

data/ to store data files.
results/ to store results/output files such as figures, output data, etc.

Development environment

I can set up the environment differently depending on the project. The irrelevant sections can be deleted when using the template.

Requirements

The following does not apply when managing requirements with conda, see the section below.

The requirements are specified in the following files:

requirements.in to specify direct dependencies.
requirements.txt to pin the dependencies (direct and indirect). This is the file used to recreate the environment from scratch using pip install -r requirements.txt.
pyproject.toml to store the direct dependencies of the src package.

The requirements.txt file should not be updated manually. Instead, I use pip-compile from pip-tools to generate requirements.txt.

Initial setup

Start with an empty requirements.txt.
Install pip-tools with pip install pip-tools.
Compile requirements with pip-compile to generate a requirements.txt file.
Install requirements with pip-sync (or pip install -r requirements.txt).

NB: the advantage of using pip-sync over pip install -r requirements.txt is that pip-sync will make sure the environment matches requirements.txt, i.e. removing packages in the environment but not in requirements.txt, if required.

Update the environment

To upgrade packages, run pip-compile --upgrade.
To add new packages, add packages in requirements.in and then compile requirements with pip-compile.

Then, the environment can be updated with pip-sync.

venv setup

To setup a Python virtual environment with venv called .venv, using the currently installed Python's version, navigate to the repository directory and run the following in the command line:

$ python -m venv .venv
$ source .venv/Scripts/activate

Conda setup

To set up the environment with conda (assuming it is already installed), navigate to the repository directory and run the following in the command line (specify the Python version and environment name as appropriate):

$ conda create -n myenv python=3.11
$ conda activate myenv
$ pip install -r requirements.in
$ pip install -e .

Then pin the requirements with:

$ conda env export > environment.yml

Finally, the environment can be recreated with:

$ conda create -n myenv -f environment.yml

VS Code Dev Containers (Docker)

A Docker container can be used as a development environment. In VS Code, this can be achieved using Dev Containers, which are configured in the .devcontainer directory. The environment is automatically built as follows:

A Docker image of Python is created with packages installed from requirements.txt (except local packages). The Python's version can be edited in the Dockerfile.
The image is ran in a container and the current directory is mounted.
The local packages are installed in the container, along with some VS Code extensions.

To set up the dev container:

Install and launch Docker.
Open the container by using the command palette in VS Code (Ctrl + Shift + P) to search for "Dev Containers: Open Folder in Container...".

If needed, the container can be rebuilt by searching for "Dev Containers: Rebuild Container...".

Setup Git pre-commit hooks

Pre-commit hooks are configured using the pre-commit tool. Currently, the hooks consists in formatting with Black. When this repository is first initialised, the hooks need to be installed with pre-commit install.

Using the template

This section can be deleted when using the template.

Getting started

Initialise your GitHub repository with this template. Alternatively, fork (or copy the content of) this repository.
Update
- the repository name
- project information in pyproject.toml
- the README
- the license
Set up your preferred development environment.
Add a git tag for the inital version with git tag -a "v0.1.0" -m "Initial setup", and push it with git push origin --tags.

VS Code

I usually work with Visual Studio code, for which various settings are already predefined. In particular, I use the following extensions for Python development.

Black for formatting.
Flake8 and SonarLint for linting.
autoDocstring extension to generate docstrings skeleton following the Google docstring format.

Possible extensions

The src/ package could contain the following modules or sub-packages depending on the project:

utils for utility functions.
data_processing for data processing functions (this could be imported as dp).
features: for extracting features.
models: for defining models.
evaluation: for evaluating performance.
plots: for plotting functions.

The repository structure could be extended with:

docs/ to store documentation, for example
- A simple API documentation of the src package could be generated using pdoc, for example.
- A full project documentation could be generated using mkdocs or quartodoc.
subfolders in data/ such as data/raw/ for storing raw data.
models/ to store model files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python analysis template

File structure

Development environment

Requirements

Initial setup

Update the environment

venv setup

Conda setup

VS Code Dev Containers (Docker)

Setup Git pre-commit hooks

Using the template

Getting started

VS Code

Possible extensions

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.devcontainer		.devcontainer
.vscode		.vscode
analysis		analysis
data		data
results		results
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Python analysis template

File structure

Development environment

Requirements

Initial setup

Update the environment

venv setup

Conda setup

VS Code Dev Containers (Docker)

Setup Git pre-commit hooks

Using the template

Getting started

VS Code

Possible extensions

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages