A great part of my job at Mozilla consists in maintaining the ecosystem of Firefox Remote Settings, which is already a few years old.
But recently I had the chance to spin up a new Python project (Poucave), and that was a good opportunity to look at recent trends that I missed :) This article goes through some of the choices we made, knowing that almost everything is obviously debatable. By the way, depending on the vigour of the Python community, please note that the information may not age well and could be outdated when you read it :)
Environment
Since we publish the app as a Docker container, we don't have to support multiple Python environments (eg. with tox). On the other hand, contributors may want to use Pyenv to overcome the limitations of their operating system.
I like to keep tooling minimalist, and I can't explain why, but I also enjoy limiting the list of configuration files in the project root folder.
Probably because of my age, I'm familiar with make. It's quite universal and popular. And using a single Makefile with the appropriate dependencies between targets, we can create the environment and run the application or the tests, by running only one make command.
I was used to Virtualenv, Pip, and requirements files. Common practice consists in having a folder with a requirements file by environment, and a constraints file for reproducible builds. We also setup Dependabot on the repo to make sure our dependencies are kept up to date.
Now the cool kids use Pipx, Pipenv, Flit, or Poetry! Even if Poetry seemed to stand out, the debate was still virulent when the project was started, especially with regards to production installs and Docker integration. Therefore I didn't make any decision and remained conservative. I'd be happy to reconsider that choice.
requirements/constraints.txt requirements/default.txt requirements/dev.txt
# constraints.txt chardet==3.0.4 \ --hash=sha256:84ab92ed1c4d4f16916e05906b6b75a6c0fb5db821cc65e70cbd64a3e2a5eaae \ --hash=sha256:fc323ffcaeaed0e0a02bf4d117757b98aed530d9ed4531e3e15460124c106691 ... ...
# default.txt -c ./constraints.txt aiohttp==3.6.2 \ --hash=sha256:1e984191d1ec186881ffaed4581092ba04f7c61582a177b187d3a2f07ed9719e \ --hash=sha256:50aaad128e6ac62e7bf7bd1f0c0a24bc968a0c0590a726d5a955af193544bcec \ ... ...
The Makefile would look like this:
SOURCE := poucave
VENV := .venv
PYTHON := $(VENV)/bin/python3
INSTALL_STAMP := $(VENV)/.install.stamp
install: $(INSTALL_STAMP)
$(INSTALL_STAMP): $(PYTHON) requirements/default.txt requirements/constraints.txt
$(PIP_INSTALL) -Ur requirements/default.txt -c requirements/constraints.txt
touch $(INSTALL_STAMP)
$(PYTHON):
virtualenv --python=python3 $(VENV)
serve: $(INSTALL_STAMP):
PYTHONPATH=. $(PYTHON) $(SOURCE)
When running make serve, the virtualenv is created if missing, and the latest dependencies are installed only if outdated...
update As you can see we don't even bother «activating» the virtualenv. Therefore we don't really need tools like virtualenvwrapper to switch between environments. That being said, I really enjoy having the Oh My Zsh plugin for it that automatically activates a related virtualenv when I jump in a folder that contains a .venv folder :) Thanks Florian for the feedback ;)
The CircleCI configuration file is as simple as:
version: 2
jobs:
test:
docker:
- image: circleci/python:3.8
steps:
- checkout
- run:
name: Code lint
command: make lint
- run:
name: Test
command: make tests
You can also see how, using an ENTRYPOINT, we can execute the tests from within the container on Circle CI.
We also have a setup that publishes our Docker container to https://hub.docker.com automatically.
Code quality
Running black to format the code is now a no-brainer. We added isort to sort and organize imports automatically too.
The working combination in one Makefile target is:
format: $(INSTALL_STAMP)
$(VENV)/bin/isort --line-width=88 --lines-after-imports=2 -rc $(SOURCE) --virtual-env=$(VENV)
$(VENV)/bin/black $(SOURCE)
Again, to avoid having an extra configuration file for isort we used CLI arguments :)
Since we want to verify code linting on the CI, we also have this lint target, that additionnally runs flake8 to detect unused imports or variables, and runs mypy for type checking.
lint: $(INSTALL_STAMP)
$(VENV)/bin/isort --line-width=88 --check-only --lines-after-imports=2 -rc $(SOURCE) --virtual-env=$(VENV)
$(VENV)/bin/black --check $(SOURCE) --diff
$(VENV)/bin/flake8 $(SOURCE) --ignore=W503,E501
$(VENV)/bin/mypy $(SOURCE) --ignore-missing-imports
By the way, using type checking in your Python project is now pretty straightforward and enjoyable :)
from typing import Any, Dict, List, Optional
def process(params: Optional[Dict[str, Any]] = None) -> List[str]:
return params.keys() if params else []
Some plugins to guarantee the quality of your contributions exist for your favorite editor. And a commit-hook can also do the job:
echo "make format" > .git/hooks/pre-commit
Check out pre-commit or Rehan's therapist for advanced commit hooks.
Note that there are complementary linting tools out there:
- flake8-docstrings or darglint to validate your docstrings
- wemake-python-styleguide for a very strict Python linter
- bandit to find common security issues
Tests
There's almost no debate about pytest nowadays. To me, the most appealing feature is the fixtures decorator, to keep your tests DRY. It enables you to use dependency injection, object factories, connection setup, config changes...
@pytest.fixture
def api_client():
client = APIClient()
client.authenticate()
yield client
client.logout()
@pytest.fixture
def mock_responses():
with responses.RequestsMock() as rsps:
yield rsps
@pytest.fixture
def make_response():
def _make_response(name):
return {"name": name}
return _make_response
async def test_api_get_gives_name(api_client, mock_responses, make_response):
mock_responses.add(responses.GET, "/", json=make_response("test"))
resp = await api_client.get()
assert resp.name == "test"
The parametrize feature is also cool:
@pytest.mark.parametrize(
("n", "expected"), [
(1, 2),
(2, 3),
pytest.mark.xfail((3, 2)),
pytest.mark.xfail(reason="some bug")((1, 0)),
pytest.mark.skipif("sys.version_info >= (3,0)")((10, 11)),
]
)
def test_increment(n, expected):
assert n + 1 == expected
As usual, I like to have make the CI fail when code coverage isn't 100%. So pytest-cov comes to the rescue:
tests: $(INSTALL_STAMP)
PYTHONPATH=. .venv/bin/pytest tests --cov-report term-missing --cov-fail-under 100 --cov $(SOURCE)
Among the handy pytest extensions, I would mention:
- pytest-mock that provides unittest.mock.patch as a mocker fixture
- pytest-benchmark that provides a benchmark fixture to measure execution performance
- pytest-watch for TDD
Executing and configuring
In order to execute the package directly from the command-line (eg. python poucave), use the poucave/__main__.py file:
import sys
from poucave.app import main
main(sys.argv[1:])
The most appreciated libraries for advanced CLI parameters seem to be Click (declarative) and Fire (automatic).
For the Docker container, at Mozilla we follow our Dockerflow conventions. This helps our operations team to treat all containers the same way, regardless of the implementation language etc.
A good take away for any application deployment is to manage configuration through environment variables (recommended in 12factor too).
We centralize all configuration values in a dedicated module config.py, that reads variables from env.
import os
DEFAULT_TTL = int(os.getenv("DEFAULT_TTL", 60))
LOG_LEVEL = os.getenv("LOG_LEVEL", "INFO").upper()
LOGGING = {
"version": 1,
"handlers": {
"console": {
"level": LOG_LEVEL,
...
}
}
}
And then simply use it everywhere in the app:
from . import config
def main(argv):
logging.config.dictConfig(config.LOGGING)
run(ttl=config.DEFAULT_TTL)
During tests, config values are changed using mock:
from unittest import mock
def test_diagram_path():
with mock.patch.object(config, "DEFAULT_TTL", "some.svg"):
main()
...
But environment can be changed too using the built-in monkeypatch fixture:
def test_lower_ttl(monkeypatch):
monkeypatch.setenv("DEFAULT_TTL", "10")
main()
If you want to allow reading configuration from a file (.env or .ini), or have complex default values, or type casting, you can use python-decouple and read configuration values through the provided helper:
from decouple import config
DEBUG = config("DEBUG", default=False, cast=bool)
HEADERS = config("HEADERS", default="{}", cast=lambda v: json.loads(v))
A Web app
The project consisted in a minimalist API. There are plenty of candidates, but I wanted something ultra simple and leveraging async/await.
Sanic and FastAPI seemed to stand out, but since my project needed an async HTTP client too, I decided to go with aiohttp which provides both server and client stuff. httpx used in Sanic could have been a good choice too.
The server code looks familiar:
from aiohttp import web
routes = web.RouteTableDef()
@routes.get("/")
async def hello(request):
body = {"hello": "poucave"}
return web.json_response(body)
def init_app(argv):
app = web.Application()
app.add_routes(routes)
return app
def main(argv):
web.run_app(init_app(argv))
And to centralize the HTTP client parameters within the app, we have this wrapper:
from contextlib import asynccontextmanager
from typing import AsyncGenerator
import aiohttp
@asynccontextmanager
async def ClientSession() -> AsyncGenerator[aiohttp.ClientSession, None]:
timeout = aiohttp.ClientTimeout(total=config.REQUESTS_TIMEOUT_SECONDS)
headers = {"User-Agent": "poucave", **config.DEFAULT_REQUESTS_HEADERS}
async with aiohttp.ClientSession(headers=headers, timeout=timeout) as session:
yield session
And we use the backoff library to manage retries:
retry_decorator = backoff.on_exception(
backoff.expo,
(aiohttp.ClientError, asyncio.TimeoutError),
max_tries=config.REQUESTS_MAX_RETRIES + 1, # + 1 because REtries.
)
@retry_decorator
async def fetch_json(url: str, **kwargs) -> object:
async with ClientSession() as session:
async with session.get(url, **kwargs) as response:
return await response.json()
In order to mock HTTP requests and responses in this setup, we use the aiohttp_client fixture from pytest-aiohttp for the application part, and aioresponses for the responses part:
@pytest.fixture
async def cli(aiohttp_client):
app = init_app()
return await aiohttp_client(app)
@pytest.fixture
def mock_aioresponses(cli):
test_server = f"http://{cli.host}:{cli.port}"
with aioresponses(passthrough=[test_server]) as m:
yield m
async def test_api_root_url(cli):
data = await cli.get("/")
assert data["app"] == "poucave"
async def test_api_fetches_info_from_source(cli, mock_aioresponses):
mock_aioresponses.get(config.SOURCE_URI, json={"success": True})
data = await cli.get("/check-source")
assert data["success"]
Misc
Some libraries and tools worth checking out:
- Arrow for better dates & times for Python
- Pydantic for data parsing and validation
- attrs for a smart alternative to named tuples
- Pypeln for concurrent async pipelines
- towncrier to automate CHANGELOG entries
- uvicorn for a performant ASGI server
update: Disclaimer: I haven't used all of them. I just saw them in several projects :)
Conclusion
I hope you found this article interesting! And most importantly, that you'll have the opportunity to leverage all these tools in your projects :)
If you think something in this article is utterly wrong, please shout out!
#python - Posted in the Dev category