Except for:
- imports
- constant definitions
- function definitions
- class definitions
- type aliases/definitions
There should be no code at global scope. In particular imports should have no side effects and main code should go in def main():.
Import side effects are a very well-known issue.
Putting main code at global scope seems to be less well-known but it has similar issues: it means that code is no longer importable (and yes you might want to in the future), and more importantly it's almost guaranteed to lead to spaghetti code. I have seen this first hand multiple times.
Pydantic gives you properly typed JSON/YAML deserialisation. This catches many many errors and makes understanding, navigating and refactoring code much easier. It also allows you to centralise validation code (e.g. a string must be uppercase), which is good for documentation and avoids the possibility of forgetting to call the validation function.
A big benefit is that it can detect unknown keys, which is a common bug where people make typos in their YAML that would otherwise go undetected.
Finally it can generate a JSON schema for your model.
Pydantic is immeasurably much better than accessing everything as dict[str, object]. If you aren't using it you're doing it wrong.
Type hints have huge benefits:
- Detect bugs at compile time.
- Make the code easier to understand.
- Make the code easier to refactor.
- Make the code waaay faster to navigate (they make go-to-definition and find-all-references work).
You should use them.
Also you should check them in CI using Pyright. This is currently by far the best Python type checker - much much better than Mypy which tends to be quite buggy and loosy goosy about correctness.
The traditional parser is argparse, which is fine but it gives you completely untyped results. A much better alternative is Typer which not only gives you typed CLI arguments, but it's easier to use, produces way nicer --help output, and even supports generating shell completion! There's no good reason to use argparse when you could use Typer.
If you find yourself using the global keyword you're almost certainly doing it wrong. Very occasionally it is justifiable. But normally you should be passing that data around as arguments or they should be class members.
Using globals makes code far less reusable, far more likely to decay into spaghetti, and also increases the scope needed to reason about code which makes it more likely that you will introduce bugs due to unknown interactions with distant code ("oh I didn't realise that global was also used over here").
All Python files MUST be in a package. Do not be tempted to just throw a script.py into a directory, chmod +x it and call it a day. Instead, create a pyproject.toml and expose it via the [project.scripts] section.
Python tooling really does not work well with files that aren't nicely in a package, and you also expose yourself to the full wilds of the import system, which approximately 0 people understand.
Just saying it again in case you didn't get the message.
Consider this code:
a: list[list[int | str]] = []
for (i, foo) in enumerate("hello"):
a.append([i + 1, foo.upper()])
It seems to be correct and will type check but this is fundamentally using the wrong data structure. list is a collection where
- The length is usually variable.
- The elements are fungible.
This is not the case here. It's always 2 elements, and they are not interchangable. In this case you should use a tuple. This is correct:
a: list[tuple[int, str]] = []
for (i, foo) in enumerate("hello"):
a.append((i + 1, foo.upper()))
If you need to group some data together a lazy way is to use a tuple or a dict, but you really shouldn't. Prefer either
class Foo(NamedTuple):
a: int
b: str
or
@dataclass
class Foo:
a: int
b: str
The primary differences between these are that a tuple is immutable, and it can also be used as a key in dicts/sets.
Dicts should be used when the keys are not known at compile time. If you're accessing a dict with a literal key, like foo["bar"], that is a big red flag.