Beginner to intermediate tips

17 February 2024

After a bootcamp or formal study and a thorough introduction to Python, there are some profitable avenues to pursue which, it seems, are often overlooked. Here I outline a handful of things in which I regard as being worth consideration, as they make it much easier to get useful work done with Python. These are:

  1. venv
  2. makefile
  3. pytest
  4. pydantic
  5. SQLAlchemy

venv

A venv is a handy place where you can pip install to your heart’s content while leaving your system python installation unaffected.

It is good practice to use a venv, not least as if you run into conflicting module version issues, which are thankfully less frequent than they used to be, you can always delete the venv and start again. For anything hardcore/production-y you may wish to consider pyenv virtualenv, but for hobby projects/just getting code running, getting a venv set up is very quick and easy.

In the folder you want to have the venv in, just run python -m venv venv and a folder will be created called venv which constitutes your venv.

Once your have created it you can just run (with bash or zsh) source ./venv/bin/activate to activate it. Once activated, any Python source code you run and any modules you install will use the venv rather than the system Python. To deactivate you can use deactivate; after the venv has been deactivated, running Python will run the Python on your system rather than the venv.

One thing to bear in mind with a venv is that it is path dependent, so you can’t move it once it has been created. If you need the venv to then be in a different place, you will want to delete it and create a new one.

makefile

Makefiles are a seriously old but useful technology and offer two things:

  • a more convenient way to run commands
  • a simple DAG

Crucially they offer a nice handy place to put commands, and they can be very simple. Here is one for a dashboard written in python which uses streamlit. Typing make run is more convenient than typing python -m streamlit run dashboard.py.

.POSIX:

# Used to activate pyenv-virtualenv
project_name = kpis-dashboard
# Not used in makefile but handy for documentation purposes
python_version = 3.12.1

help:
	@echo "help here"

test:
	python -m pytest -s

activate:
	@echo pyenv activate $(project_name)

run:
	python -m streamlit run dashboard.py

format:
	find . -name '[a-z]*.py' | xargs python -m black -l 140
	find . -name '[a-z]*.ipynb' | xargs python -m black -l 140

requirements:
	pip freeze > requirements.txt

The combination makes the makefile particularly well suited to running commands that manipulate files in a chain. This combination is ideal where you might want to compile something before you run it, which is the original purpose of the makefile from the days where the majority of software in UNIX-land was written in C. Here is an example for a very simple HTTP server (which I call netpage) used on a local network. This example requires a compiled language and so go is shown:

# If run with no arguments it will build and then run

.POSIX:

# Define the name of the go binary here
GO = go
TARGET = netpage

# Run server with low priority; we use sudo so we can listen on port 80
run: $(TARGET)
	sudo nice -n 20 ./$(TARGET)

# Build if cmd/netpage/netpage.go is newer than ./netpage
netpage: cmd/$(TARGET)/$(TARGET).go
	$(GO) build cmd/$(TARGET)/$(TARGET).go

In the example above, on make run, if netpage.go is more recent than the netpage binary, it will be recompiled before it is run.

pytest

Testing with pytest is extremely easy and allows you to arbitrarily call functions, which is handy in development.

If you are testing mypythonfile.py, create a new file called test_mypythonfile.py like so:

from mypythonfile import *

# To test some_function which is defined in my_python_file
# for the sake of exposition we assume some_function(s: str) -> str
def test_some_function() -> None:
    input_ = "Input passed to function"
    expectaton = "Value I expect to return"
    assert some_function(input_) == expectation

Then just run python -m pytest -s, or, even easier, add a test makefile rule to your makefile to run this for you whenever you run make test.

pydantic

Pydantic is a way to enforce types within classes within Python. Python famously employs duck-typing i.e. generally makes types as unobtrusive as possible, as part of the design of the language. Sometimes it is highly beneficial to have certainty about what types are being used, and pydantic can be used for this. I find pydantic particularly useful when dealing with databases/ORMs, as in this case you do need to care about types.

import pydantic
from datetime import date

class Person(pydantic.BaseModel):
    model_config = pydantic.ConfigDict(arbitrary_types_allowed=True)

    name: str
    title: str
    date_of_birth: date

bob = Person(name='Bob', title='Mr.', date_of_birth=date.today())

Using SQLAlchemy as an ORM

An ORM is essentially a way to abstract away a database—meaning a standard RDBMS like PostgreSQL, MySQL, MS SQL Server, and the like—and interact with it as though it were a standard Python object.

SQLAlchemy with vanilla SQL, which is what you want when you want to get data out of the db for use in a Jupyter notebook or similar:

import pandas as pd
import sqlalchemy
import os
from sqlalchemy.orm import sessionmaker

user = os.environ["USER"]
password = os.environ["PASS"]
host = os.environ["HOST"]
db = os.environ["DB"]

engine = sqlalchemy.create_engine(f"postgresql://{user}:{password}@{host}/{db}")

def q(s: str, engine: sqlalchemy.engine.base.Engine=engine):
    with engine.connect() as con:
        ret = con.execute(sqlalchemy.text(s)).fetchone()[0]
    return ret

max_date = q("select max(date) from table_name")

SQLAlchemy used as an ORM looks like this:

import sqlalchemy
import datetime
import pydantic
import sqlalchemy.orm
import functools
from typing import Any, Final

default_db_name: Final[str] = "punch-card"
default_db_url: Final[str] = f"sqlite:///{default_db_name}.sqlite3"

# Metadata for sqlalchemy tables
metadata = sqlalchemy.MetaData()

# Delcarative base class for sqlalchmey
Base = sqlalchemy.orm.declarative_base()


class TimerLogPydantic(pydantic.BaseModel):
    label: str
    state: str
    date: datetime.date
    ts: datetime.datetime


timer_log_table = sqlalchemy.Table(
    "timer_log",
    metadata,
    sqlalchemy.Column("label", sqlalchemy.Text, nullable=False),
    sqlalchemy.Column("state", sqlalchemy.Text, nullable=False),
    sqlalchemy.Column("date", sqlalchemy.Date, nullable=False),
    sqlalchemy.Column("ts", sqlalchemy.DateTime(timezone=True), primary_key=True),
)


class TimerLogBase(Base):
    __tablename__ = "timer_log"

    label = sqlalchemy.Column(sqlalchemy.Text, nullable=False)
    state = sqlalchemy.Column(sqlalchemy.Text, nullable=False)
    date = sqlalchemy.Column(sqlalchemy.Date, nullable=False)
    ts = sqlalchemy.Column(sqlalchemy.DateTime(timezone=True), primary_key=True)

    def __repr__(self) -> str:
        return f"TimerLogDB(label={self.label!r}, state={self.state!r}), date={self.date!r}), ts={self.ts!r})"


def get_engine_and_ddl(conn_str: str) -> sqlalchemy.engine.base.Engine:
    engine = sqlalchemy.create_engine(conn_str)

    # Check we have what we expect from the matadata, otherwise create
    expected_tables = metadata.tables.keys()
    inspector = sqlalchemy.inspect(engine)
    have_tables = inspector.get_table_names()

    if set(have_tables) == set(expected_tables):
        # Expected tables are the same as existing tables"
        pass
    else:
        # Have {have_tables} tables, expect {expected_tables} tables, will create
        metadata.create_all(engine)

    return engine


def get_session(conn_str: str) -> sqlalchemy.orm.session.Session:
    engine = get_engine_and_ddl(conn_str=conn_str)
    Session = sqlalchemy.orm.sessionmaker(engine)
    return Session()


def write_timer_log(row: dict[Any], conn_str: str = default_db_url) -> None:
    # Put into pydantic class first so we catch any issues
    stg = TimerLogPydantic(**row)
    insert_data = TimerLogBase(label=stg.label, state=stg.state, date=stg.date, ts=stg.ts)
    with get_session(conn_str=conn_str) as session:
        session.add(insert_data)
        session.commit()


def read_timer_log(conn_str: str = default_db_url) -> list[TimerLogBase]:
    with get_session(conn_str=conn_str) as session:
        statement = sqlalchemy.select(TimerLogBase).order_by(TimerLogBase.ts)
        ret = session.scalars(statement).all()
    return ret