Build Generative AI Apps with Docker And Hugging Face's Docker Spaces
Based on Krish Naik's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Build a FastAPI service with a health-check homepage route and a /generate endpoint that calls a Transformers text-generation pipeline.
Briefing
A practical path to ship a text-generation generative AI app is laid out end-to-end: build a FastAPI service that wraps a Hugging Face Transformers text-generation pipeline, dockerize it with a reproducible Python environment, then deploy it to Hugging Face Spaces so it runs automatically on commit. The payoff is immediate—once the Space finishes building, users get both a working API endpoint and FastAPI’s Swagger UI to test generation requests in the browser.
The workflow starts with creating a local Python environment (Python 3.9) and installing the core dependencies needed for an LLM-backed web service. The project uses FastAPI to expose HTTP routes, Transformers to access model inference, and PyTorch (torch) as the underlying compute library. A requirements.txt file is created to pin the needed packages, including FastAPI, requests, sentencepiece, torch, and Transformers. After installing dependencies in the virtual environment, the next step is writing the application code.
The FastAPI app defines a simple homepage route that returns a “hello world” message to confirm the service is alive. It also defines a /generate endpoint that accepts input text and feeds it into a Transformers pipeline configured for text-to-text generation. For the model choice, the walkthrough uses flan-t5-small (a smaller variant intended to fit free-tier resource limits on Hugging Face Spaces). The pipeline output is returned as JSON, with the generated text extracted from the pipeline’s response structure (noted as generator_text in the model’s output). FastAPI’s built-in Swagger documentation becomes the interactive control panel once deployed.
Dockerization is treated as the central engineering step. A Dockerfile is created from the official Python 3.9 image, sets a working directory, copies requirements.txt into the container, and installs dependencies using pip install with caching disabled. The Dockerfile also creates and switches to a non-root user for safer execution, then copies the application code into the container. Finally, it starts the FastAPI app via uvicorn, binding to 0.0.0.0 on a specified port (the walkthrough uses 7860).
Deployment happens through Hugging Face Spaces. A new Space is created using a Docker template, then the app.py, Dockerfile, and requirements.txt are uploaded to the Space repository. Each commit triggers an automated Docker build and container run. If the build succeeds, the homepage endpoint returns “hello world,” confirming the container is executing correctly. From there, users can open a direct URL or navigate to /docs to access Swagger UI and test /generate.
In the Swagger interface, sample prompts like “tell me about machine learning” or “how to be happy” produce completed sentences, demonstrating text-to-text generation powered by Transformers and the flan-t5-small model. The overall message is less about model novelty and more about operational reliability: containerizing the FastAPI + Transformers stack makes the generative AI app portable and deployable with minimal friction.
Cornell Notes
The project builds a text-generation API using FastAPI and Hugging Face Transformers, then packages it into a Docker container for deployment on Hugging Face Spaces. A Transformers text-to-text pipeline is initialized with flan-t5-small to fit the limited free-tier resources. FastAPI exposes a homepage route for health checking and a /generate route that accepts input text and returns generated text as JSON. The Dockerfile pins the runtime (Python 3.9), installs dependencies from requirements.txt, runs under a non-root user, and starts the service with uvicorn. After uploading and committing to a Docker-based Space, the Space auto-builds and the app becomes testable via Swagger UI at /docs.
How does the app turn user text into generated output?
Why is requirements.txt central to both local development and deployment?
What does the Dockerfile accomplish for a Hugging Face Space deployment?
How do users verify the service is working after deployment?
Why choose flan-t5-small in this setup?
Review Questions
- What routes does the FastAPI app expose, and what does each one return?
- Which dependencies must be included in requirements.txt for the Transformers pipeline to run, and why?
- How does the Dockerfile ensure the container starts the FastAPI service correctly in a cloud environment?
Key Points
- 1
Build a FastAPI service with a health-check homepage route and a /generate endpoint that calls a Transformers text-generation pipeline.
- 2
Use requirements.txt to list FastAPI, Transformers, torch, and tokenization dependencies so Docker builds install everything automatically.
- 3
Initialize a Transformers pipeline with flan-t5-small to fit Hugging Face Spaces free-tier resource limits.
- 4
Dockerize the app with a Dockerfile that installs dependencies, runs as a non-root user, and starts uvicorn on the expected port.
- 5
Deploy to a Hugging Face Space configured for Docker so commits trigger automatic image builds and container runs.
- 6
Validate deployment by checking the homepage response and testing /generate through Swagger UI at /docs.