Mastering the Art of Python Project Setup: A Step-by-Step Guide

Whether you’re a seasoned developer or simply getting began with 🐍 Python, it’s vital to know construct robust and maintainable projects. This tutorial will guide you thru the strategy of establishing a Python project using a few of the hottest and effective tools within the industry. You’ll learn use GitHub and GitHub Actions for version control and continuous integration, in addition to other tools for testing, documentation, packaging and distribution. The tutorial is inspired by resources resembling Hypermodern Python and Best Practices for a brand new Python project. Nonetheless, this isn’t the one method to do things and you would possibly have different preferences or opinions. The tutorial is meant to be beginner-friendly but additionally cover some advanced topics. In each section, you’ll automate some tasks and add badges to your project to point out your progress and achievements.
The repository for this series will be found at github.com/johschmidt42/python-project-johannes
This part was inspired by this blog post:
Semantic release with Python, Poetry & GitHub Actions 🚀
I’m planning so as to add a couple of features to Dr. Sven because of some interest from my colleagues. Before doing so, I needed to…
- OS: Linux, Unix, macOS, Windows (WSL2 with e.g. Ubuntu 20.04 LTS)
- Tools: python3.10, bash, git, tree
- Version Control System (VCS) Host: GitHub
- Continuous Integration (CI) Tool: GitHub Actions
It is predicted that you simply are acquainted with the versioning control system (VCS) git. If not, here’s a refresher for you: Introduction to Git
Commits can be based on best practices for git commits & Conventional commits. There’s the traditional commit plugin for PyCharm or a VSCode Extension that enable you to write down commits on this format.
Overview
Structure
- Git Branching Strategy (GitHub flow)
- What’s a release? (zip, tar.gz)
- Semantic Versioning (v0.1.0)
- Create a release manually (git tag, GitHub)
- Create a release routinely (conventional commits, semantic releases)
- CI/CD (release.yml)
- Create a Personal Access Token (PAT)
- GitHub Actions Flow (Orchestrating workflows)
- Badge (Release)
- Bonus (Implement conventional commits)
Releasing software is a very important step within the software development process because it makes recent features and bugfixes available to users. One key aspect of releasing software is versioning, which helps to trace and communicate the changes made in each release. Semantic versioning is a widely used standard for versioning software, which uses a version number within the format of Major.Minor.Patch (e.g. 1.2.3) to point the extent of changes made in a release.
Conventional commits is a specification for adding human and machine readable intending to commit messages. It’s a method to format commit messages in a consistent manner, which make it easy to find out the sort of change made. Conventional commits are commonly used at the side of semantic versioning, because the commit messages will be used to routinely determine the version variety of a release. Together, semantic versioning and standard commits provide a transparent and consistent method to track and communicate the changes made in each release of a software project.
There are numerous different branching strategies on the market for git. Many individuals gravitate towards GitFlow (or variants), Three Flow, or Trunk based Flows. Some do strategies in between these, resembling this one. I’m using the quite simple GitHub flow branching strategy, where all bug fixes and features have their very own separate branch, and when complete, each branch is merged to major and deployed. Easy, nice and simple.
Whatever your strategy is perhaps, in the long run you merge a pull request and (probably) create a release.
Briefly, a release is packing up code of a version (e.g. zip) and pushing it to production (whatever this is perhaps for you).
Release management will be messy. Due to this fact there must be a concise way that you simply follow (and others), that defines what a release means and what changes between one release and the following. When you don’t track the changes between the releases, you then probably won’t understand what has been modified in each release and you’ll be able to’t discover any problems that might need been introduced with recent code. With no changelog, it may possibly be obscure how the software has evolved over time. It may also make it difficult to roll back changes if crucial.
Semantic Versioning is only a number schema and standard practice within the industry for software development. It indicates the extent of changes between this version and the previous one. There are three parts to a semantic version number, resembling 1.8.42, that follow the pattern of :
Each considered one of them means a distinct degree of change. A PATCH release indicates bug fixes or trivial changes (e.g. from 1.0.0 to 1.0.1). A MINOR release indicates adding/removing functionality or backwards compatible changes of functionality (e.g. from 1.0.0 to 1.1.0). A MAJOR release indicates adding/removing functionality and potentially backwards in-compatible changes resembling breaking changes (e.g. from 1.0.0 to 2.0.0).
I like to recommend a chat of Mike Miles, should you need a visual introduction into releases with semantic versioning. It’s a summary of what releases are and the way semantic versioning with git tags allows us to create releases.
About git tags: There are lightweight and annotated tags in git. A lightweight tag is only a pointer to a selected commit whereas an annotated tag is a full object in git.
Let’s create a release manually first after which automate it.
When you remember, our example_app’s __init__.py
file incorporates the version
# src/example_app/__init__.py__version__ = "0.1.0"
in addition to the pyproject.toml
file
# pyproject.toml[tool.poetry]
name = "example_app"
version = "0.1.0"
...
So the very first thing we must do is to create an annotated git tag v0.1.0
and add it to the most recent commit in major:
> git tag -a v0.1.0 -m "version v0.1.0"
Please note that if no commit hash is specified at the tip of the command, then git will use the present commit you’re on.
We will get a listing of tags with:
> git tagv0.1.0
and if we wish delete it again:
> git tag -d v0.1.0Deleted tag 'v0.1.0'
and get more information in regards to the tag with:
> git show v0.1.0tag v0.1.0
Tagger: Johannes Schmidt
Date: Sat Jan 7 12:55:15 2023 +0100
version v0.1.0
commit efc9a445cd42ce2f7ddfbe75ffaed1a5bc8e0f11 (HEAD -> major, tag: v0.1.0, origin/major, origin/HEAD)
Creator: Johannes Schmidt <74831750+johschmidt42@users.noreply.github.com>
Date: Mon Jan 2 11:20:25 2023 +0100
...
We will push the newly created tag to origin with
> git push origin v0.1.0Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 171 bytes | 171.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:johschmidt42/python-project-johannes.git
* [new tag] v0.1.0 -> v0.1.0
in order that this git tag is now available on GitHub:
Let’s manually create a brand new release in GitHub with this git tag:
We click on Create a brand new release
, select our existing tag (that’s already certain to a commit) after which generate release notes routinely by clicking on the Generate release notes
button before we finally publish the discharge with the Publish release
button.
GitHub will routinely create a tar
and a zip
(assets) for the source code, but is not going to construct the appliance! The result will appear to be this:
To summarise, the steps for a release are:
- create a brand new branch out of your default branch (e.g. feature or fix branch)
- make changes and increase the version (e.g. pyproject.toml and __init__.py)
- commit the feature/bug fix to the default branch (probably through a Pull Request)
- add an annotated git tag (semantic version) to the commit
- publish the discharge on GitHub with some additional information
As programmers, we don’t prefer to repeat ourselves. So there are many tools that make these steps super easy for us. Here, I’ll introduce Semantic Releases, a tool specifically for Python Projects.
It’s a tool which routinely sets a version number in your repo, tags the code with the version number and creates a release! And that is all done using the contents of Conventional Commit style messages.
Conventional Commits
What’s the connection between semantic versioning and conventional-commits?
Certain commit types will be used to routinely determine a semantic version bump!
- A
fix
commit is a PATCH. - A
feat
commit is a MINOR. - A commit with
BREAKING CHANGE
or!
is a MAJOR.
Other types, e.g. construct
, chore
, ci
, docs
, style
, refactor
, perf
, test
generally don’t increase the version.
Try the bonus section at the tip to search out out implement conventional commits in your project!
Automatic semantic releases (locally)
We will add the library with:
> poetry add --group semver python-semantic-release
Let’s undergo the configuration settings that allow us to routinely generate change-logs and releases. Within the pyproject.toml
, we are able to add semantic_release as a tool:
# pyproject.toml...
[tool.semantic_release]
branch = "major"
version_variable = "src/example_app/__init__.py:__version__"
version_toml = "pyproject.toml:tool.poetry.version"
version_source = "tag"
commit_version_number = true # required for version_source = "tag"
tag_commit = true
upload_to_pypi = false
upload_to_release = false
hvcs = "github" # gitlab can also be supported
branch
: specifies the branch that the discharge ought to be based on, on this case the “major” branch.version_variable
: specifies the file path and variable name of the version number within the source code. On this case, the version number is stored within the__version__
variable within the filesrc/example_app/__init__.py
.version_toml
: specifies the file path and variable name of the version number within thepyproject.toml
file. On this case, the version number is stored within thetool.poetry.version
variable of thepyproject.toml
fileversion_source
: Specifies the source of the version number. On this case, the version number is obtained from the tag (as an alternative of commit)commit_version_number
: This parameter is required whenversion_source = "tag"
. It specifies whether the version number ought to be committed to the repository or not. On this case, it is about to true, which implies that version number can be committed.tag_commit
: Specifies whether a brand new tag ought to be created for the discharge commit. On this case, it is about to true, which implies that a brand new tag can be created.upload_to_pypi
: Specifies whether the package ought to be uploaded to the PyPI package repository. On this case, it is about to false, which implies that the package is not going to be uploaded to PyPI.upload_to_release
: Specifies whether the package ought to be uploaded to the GitHub release page. On this case, it is about to false, which implies that the package is not going to be uploaded to GitHub releases.hvcs
: Specifies the hosting version control system of the project. On this case, it is about to “github”, which implies that the project is hosted on GitHub. “gitlab” can also be supported.
We will update the files where we now have defined the version of the project/module. For this we use the variable version_variable
for normal files and version_toml
for .toml files. The version_source
defines the source of truth for the version. Since the version in these two files is tightly coupled with the git annotated tags, for instance we create a git tag with every release routinely (flag tag_commit
is about to true), we are able to use the source tag
as an alternative of the default value commit
that appears for the last version within the commit messages. To give you the option to update the files and commit the changes, we’d like to set the commit_version_number
flag to true. Because we don’t wish to upload anything to the Python index PyPi, the flag upload_to_pypi
is about to false. And for now we don’t wish to upload anything to our releases. The hvcs
is about to github
(default), other values will be: gitlab
.
We will test this locally by running a couple of commands, that I’ll add on to our Makefile:
# Makefile...
##@ Releases
current-version: ## returns the present version
@semantic-release print-version --current
next-version: ## returns the following version
@semantic-release print-version --next
current-changelog: ## returns the present changelog
@semantic-release changelog --released
next-changelog: ## returns the following changelog
@semantic-release changelog --unreleased
publish-noop: ## publish command (no-operation mode)
@semantic-release publish --noop
With the command current-version we get the version from the last git tag within the git tree:
> make current-version0.1.0
If we add a couple of commits in conventional commit style, e.g. feat: recent cool feature
or fix: nasty bug
, then the command next-version will compute the version bump for that:
> make next-version0.2.0
Straight away, we don’t have a CHANGELOG file in our project, in order that after we run:
> make current-changelog
the output can be empty. But based on the commits we are able to create the upcoming changelog with:
> make next-changelog### Feature
* Add releases ([#8](https://github.com/johschmidt42/python-project-johannes/issues/8)) ([`5343f46`](https://github.com/johschmidt42/python-project-johannes/commit/5343f46d9879cc8af273a315698dd307a4bafb4d))
* Docstrings ([#5](https://github.com/johschmidt42/python-project-johannes/issues/5)) ([`fb2fa04`](https://github.com/johschmidt42/python-project-johannes/commit/fb2fa0446d1614052c133824150354d1f05a52e9))
* Add application in app.py ([`3f07683`](https://github.com/johschmidt42/python-project-johannes/commit/3f07683e787b708c31235c9c5357fb45b4b9f02d))### Documentation
* Add search bar & github url ([#6](https://github.com/johschmidt42/python-project-johannes/issues/6)) ([`3df7c48`](https://github.com/johschmidt42/python-project-johannes/commit/3df7c483eca91f2954e80321a7034ae3edb2074b))
* Add badge pages.yml to README.py ([`b76651c`](https://github.com/johschmidt42/python-project-johannes/commit/b76651c5ecb5ab2571bca1663ffc338febd55b25))
* Add documentation to Makefile ([#3](https://github.com/johschmidt42/python-project-johannes/issues/3)) ([`2294ee1`](https://github.com/johschmidt42/python-project-johannes/commit/2294ee105b238410bcfd7b9530e065e5e0381d7a))
If we push recent commits (on to major or through a PR) we could now publish a brand new release with:
> semantic-release publish
The publish command will do a sequence of things:
- Update or create the changelog file.
- Run semantic-release version.
- Push changes to git.
- Run build_command and upload the distribution file to your repository.
- Run semantic-release changelog and post to your vcs provider.
- Attach the files created by build_command to GitHub releases.
Every step will be after all configured or deactivated!
Let’s construct a CI pipeline with GitHub Actions that runs the publish command of semantic-release with every commit to the major branch.
While the general structure stays the identical as in lint.yml, test.yml or pages.yml, there are a couple of changes that must be mentioned. Within the step Checkout repository
, we add a brand new token that’s used to checkout the branch. That’s since the default value GITHUB_TOKEN
doesn’t have the required permissions to operate on protected branches. Due to this fact, we must use a secret (GH_TOKEN) that incorporates a Personal Access Token with permissions. I’ll show later how the Personal Access Token will be generated. We also define fetch-depth: 0
to fetch all history for all branches and tags.
with:
ref: ${{ github.head_ref }}
token: ${{ secrets.GH_TOKEN }}
fetch-depth: 0
We install only the dependencies which are required for the semantic-release tool with:
- name: Install requirements
run: poetry install --only semver
Within the last step, we modify some git configurations and run the publish command of semantic-release:
- name: Python Semantic Release
env:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
run: |
set -o pipefail
# Set git details
git config --global user.name "github-actions"
git config --global user.email "github-actions@github.com"
# run semantic-release
poetry run semantic-release publish -v DEBUG -D commit_author="github-actions "
By changing the git config, the user that commits can be “github-actions”. We run the publish command with DEBUG logs (stdout) and set the commit_author
to “github-actions” explicitly. Alternatively to this command, we could use the GitHub motion from semantic-release directly, however the arrange steps of running the publish command are only a few and the motion uses a docker container that should be pulled each time. Due to that I prefer to make an easy run step as an alternative.
Since the publish command will make a commit, you is perhaps frightened that we could find yourself in an countless loop of workflows being triggered. But don’t worry, the resulting commit is not going to trigger one other GitHub Actions Workflow run. That is attributable to limitations set by GitHub.
Personal access token are an alternative choice to using passwords for authentication to GitHub Enterprise Server when using the GitHub API or the command line. Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of a company, or for long-lived integrations, you must use a GitHub App. For more information, see “About apps.”
In other words: We will create an Personal Access Token and have GitHub actions store and use that secret to perform certain operations on our behalf. Take note, if the PAT is compromised, it could possibly be used to perform malicious actions in your GitHub repositories. It’s due to this fact beneficial to make use of GitHub OAuth Apps & GitHub Apps in organisations. For the needs of this tutorial, we can be using a PAT to permit the GitHub actions pipeline to operate on our behalf.
We will create a brand new access token by navigating to the Settings
section of your GitHub user and following the instructions summarised in Making a Personal Access Token. This may give us a window that may appear to be this:
By choosing the scopes, we define what permissions the token could have. For our use case, we’d like push access to the repositories which why the brand new PAT GH_TOKEN
must have the repo
permissions scope. That scope would authorise pushes to protected branches, given you do not have Include administrators set within the protected branch’s settings.
Going back to the repository overview, within the Settings menu, we are able to either add an environment setting or a repository setting under the Secrets section:
Repository secrets are specific to a single repository (and all environments utilized in there), while environment secrets are specific to an environment. The GitHub runner will be configured to run in a selected environment which allows it to access the environment’s secrets. This is sensible when pondering of various stages (e.g. DEV vs PROD) but for this tutorial I’m wonderful with a repository secret.
Now that we a have a couple of pipelines (linting, testing, releasing, documentation), we must always think in regards to the flow of actions with a commit to major! There are a couple of things we must always pay attention to, a few of them specific to GitHub.
Ideally, we wish that a commit to major creates a push event that trigger the Testing and the Linting workflow. If these are successful, we run the discharge workflow which is responsible to detect if there ought to be a version bump based on conventional commits. In that case, the discharge workflow will directly push to major, bumping the versions, adding a git tag and create a release. A broadcast release should then, for instance, update the documentation by running the documentation workflow.
Problems & considerations
- When you read the last paragraph rigorously or checked out the FlowChart above, you would possibly have noticed that there are two commits to major. One initial (i.e. from a PR) and a second one for the discharge. Because our lint.yml and test.yml react on push events on the major branch, they might run twice! We should always avoid running it twice to save lots of resources. To realize this, we are able to add the
[skip ci]
string to our version commit message. A custom commit message will be defined within the pyproject.toml file for the tool semantic_release.
# pyproject.toml...
[tool.semantic_release]
...
commit_message = "{version} [skip ci]" # skip triggering ci pipelines for version commits
...
2. The workflow pages.yml currently runs on a push event to major. Updating the documentation could possibly be something that we only wish to do if there may be a brand new release (We is perhaps referencing the version within the documentation). We will change the trigger within the pages.yml file accordingly:
# pages.ymlname: Documentation
on:
release:
types: [published]
Constructing the documentation will now require a published release.
3. The Release workflow should rely upon the success of the Linting & Testing workflow. Currently we don’t have defined dependencies in our workflow files. We could have these workflows rely upon the completion of defined workflow runs in a selected branch with the workflow_run
event. Nonetheless, if we specify multiple workflows
for the workflow_run
event:
on:
workflow_run:
workflows: [Testing, Linting]
types:
- accomplished
branches:
- major
only considered one of the workflows must accomplished! This isn’t what we wish. We expect that every one workflows should be accomplished (and successful). Only then the discharge workflow should run. That is in contrast to what we get after we define dependencies between jobs in a single workflow. Read more about this inconsistency and shortcoming here.
In its place, we could use a sequential execution of pipelines:
The massive downside with this concept is that it a) doesn’t allow parallel execution and b) we won’t give you the option to see the dependency graph in GitHub.
Solution
Currently, the one way I see to take care of the above mentioned problems is to orchestrate the workflows in an orchestrator workflow.
Let’s create this workflow file:
The orchestrator is triggered after we push to the branch major
.
Provided that each workflows: Testing & Linting are successful, the discharge workflow known as. That is defined in with the needs
keyword. If we wish to have more granular control over job executions (workflows), think about using the if
keyword as well. But pay attention to the confusing behaviour as explained in this text.
To make our workflows lint.yml
, test.yml
& release.yml
callable by one other workflow, we’d like to update the triggers:
# lint.yml---
name: Linting
on:
pull_request:
branches:
- major
workflow_call:
jobs:
...
# test.yml---
name: Testing
on:
pull_request:
branches:
- major
workflow_call:
jobs:
...
# release.yml---
name: Release
on:
workflow_call:
jobs:
...
Now the brand new workflow (Release) should only run if the workflows for quality checking, on this case the linting and testing, succeed.
To create a badge, this time, I’ll use the platform shields.io.
It’s a web site that generates badges for projects, which display information resembling version, construct status, and code coverage. It offers a wide selection of templates and allows customization of appearance and creation of custom badges. The badges are updated routinely, providing real-time information in regards to the project.
For a release badge, I chosen GitHub release (latest SemVer)
:
The badge markdown will be copied and added to the README.md:
Our landing page of the GitHub now looks like this ❤ (I’ve cleaned up just a little and provided an outline):