Building a Scalable FastAPl Backend for Al Projects: A Guide

🖖 Prateek Joshi and Akhil Kalwakurthy

As AI engineers building complex machine learning systems, we often face a critical challenge: creating backend infrastructure that's both performant enough for real-time AI processing and maintainable enough for rapid iteration. In this post, We’ll walk through our journey of building a production-ready FastAPI backend that powers our AI orchestration platform, sharing technical details and hard-won lessons from our experience.

1. Code Editor Setup: Developer Containers Done Right

Why Dev Containers Matter

After struggling with Python version conflicts and the infamous "works on my machine" syndrome, we standardized on VS Code Dev Containers. This approach ensures:

Isolated Python environments per project.
Consistent tooling across Windows, macOS, and Linux.
Reproducible builds from development to production.

Our Dockerfile starts from Ubuntu Rolling to leverage the latest system packages and Python versions:

FROM docker.io/ubuntu:rolling
USER root

# Use a strong/secure password
RUN echo "ubuntu:password" | chpasswd && \
    apt update && apt dist-upgrade -y && \
    apt install -y python3.12-full python3.12-venv pipx git sudo
USER ubuntu
RUN pipx ensurepath && \
    pipx install hatch

Key decisions:

Ubuntu Rolling: To leverage Python 3.12’s new features.
pipx: A tool that ensures global Python utilities like Hatch or virtualenv are installed cleanly and in isolation. pipx creates a dedicated virtual environment for each tool, avoiding dependency conflicts and enabling easy upgrades or removals. This is invaluable in an enterprise-grade setup where developers use varied tools.
Hatch: Our choice for Python environment and project management. Hatch simplifies environment setup, dependency management, and testing workflows. It’s particularly suited for enterprise-grade projects due to its robust support for multi-environment configurations, fast builds using the UV installer, and strict adherence to modern Python packaging standards.

The devcontainer.json defines our IDE environment:

{
  "name": "intellect-mesh",
  "build": { "dockerfile": "Dockerfile" },
  "customizations": {
    "vscode": {
      "extensions": [
        "ms-python.python",
        "charliermarsh.ruff",
        "ms-python.mypy-type-checker",
        "redhat.vscode-yaml",
        "eamodio.gitlens",
        "MS-vsliveshare.vsliveshare"
      ]
    }
  },
  "runArgs": ["--userns=keep-id"],
  "postStartCommand": "PATH=/home/ubuntu/.local/bin hatch env create && PATH=/home/ubuntu/.local/bin hatch fmt --check --sync",
  "containerEnv": { "HOME": "/home/ubuntu" },
  "remoteUser": "ubuntu"
}

Critical Components

Ruff: Chosen over flake8 for its exceptional performance, achieving 10-100x faster linting by being implemented in Rust. Ruff provides comprehensive linting rules and integrates seamlessly into CI pipelines, ensuring high code quality without slowing down development.
Mypy: Implements static typing in Python to catch errors at compile time rather than runtime, which is critical for maintaining robust type safety in our AI codebase. This helps reduce runtime errors in complex data pipelines and ensures compatibility with type-annotated libraries.
postStartCommand: Configures the container to automatically set up Hatch environments, enforce consistent code formatting, and validate existing code. This automation reduces manual overhead and ensures all developers follow consistent workflows.
userns=keep-id: Maintains file permissions by mapping container user IDs to host user IDs. This avoids permission issues when files are created or modified inside the container and ensures seamless collaboration between developers on different host systems.

2. Project Setup with Modern Python Tooling

Why Hatch Over Poetry or Pipenv

We evaluated several tools and ultimately chose Hatch due to:

Standardized project structure.
Integration with UV, a Rust-based installer that’s up to 100x faster than pip.
Efficient multi-environment management for testing.

Our pyproject.toml outlines the project configuration:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "genesis-mesh"
dynamic = ["version"]
description = ""
readme = "Readme.md"
requires-python = ">=3.12.0,<3.13.0"
keywords = []
authors = [{name = "Akhil K"}, {name = "Prateek J"}]
dependencies = [
    "fastapi[standard]==0.115.5",
    "websockets==14.1"
]

[tool.hatch.version]
path = "src/genesis_mesh/__about__.py"

[tool.hatch.envs.default]
installer = "uv"

[tool.hatch.envs.types]
extra-dependencies = ["mypy==1.13.0"]

[tool.hatch.envs.types.scripts]
check = "mypy --install-types --non-interactive {args:src/genesis_mesh}"

[tool.hatch.envs.hatch-static-analysis]
config-path = "ruff_defaults.toml"

[tool.ruff]
extend = "ruff_defaults.toml"

Key Decisions

Python 3.12+: For new language features like structural pattern matching and improved error messages.
Strict Python versioning: Avoids compatibility issues with older Python versions.
Hatchling backend: Simplifies builds with minimal boilerplate.
Rust-based installer (UV): For ultra-fast dependency resolution.

Project Structure

We adopted the src layout to:

Prevent accidental imports from the project root.
Enable proper namespace packaging.
Simplify testing and coverage.

📁 src
└── 📁 genesis_mesh
    ├── 📄 __init__.py
    ├── 📄 __about__.py
    └── 📄 __main__.py

The src structure aligns with Python Packaging Authority (PyPA) guidelines and ensures clean imports.

Set version for the project

# __main__.py
version = "1.0.0.dev0"

3. Building the FastAPI Core

Our AI backend had to support WebSocket streaming for real-time AI inferences.

Entry Point Architecture

We designed a scalable and modular entry point:

from argparse import ArgumentParser
from fastapi import APIRouter, FastAPI
from uvicorn import run

DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8080

class GenesisMesh:
    def __init__(self):
        self.ws_api = APIRouter(prefix="/ws")

    def setup(self):
        pass

  def __call__(self, host: str, port: int, *args, **kwds):
    app = FastAPI()
    app.include_router(router=self.ws_api)
    run(app=app, host=host, port=port)

Design Decisions

Class-based API: Encapsulates state and makes it easier to manage multiple models or pipelines.
Explicit setup method: Clearly separates initialization from execution.
WebSocket-first design: Enables bi-directional streaming, essential for real-time inference tasks.

Launching the Server

We’ve added a CLI-friendly main function for flexibility:

def main():
    parser = ArgumentParser(description="Genesis Mesh Agent Framework")
    parser.add_argument("--host", help="Host address to use", default=DEFAULT_HOST)
    parser.add_argument("--port", help="Port to use", type=int, default=DEFAULT_PORT)
    args = parser.parse_args()

    mesh = GenesisMesh()
    mesh.setup()
    mesh(host=args.host, port=args.port)

if __name__ == "__main__":
    main()

This design ensures flexibility for running the backend locally or in production with different configurations.

That's it! You can now launch the FastAPI app by running the following command:

$ hatch run python src/genesis_mesh --host 127.0.0.1 --port 8000

You should see something like this! 🎉

In our next installment, we'll explore advanced topics including:

Langchain and Langgraph integration
Benefits and use cases of Langgraph
Implementing a blog agent using graph architecture

Stay tuned for Part 2

How We Built a Scalable FastAPI Backend for Our AI Product: Zero to Production

Table of contents