Build Next JS Full Stack AI app with Huggingface, Docker, AWS
Based on AI Arcade's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The UI uploads an image and shows results only after the API returns an uploaded image URL plus a consolidated, confidence-filtered object-count label.
Briefing
The core outcome is a working full-stack Next.js AI app that lets users upload an image, sends it to a server endpoint, runs object detection with a Hugging Face ONNX model, and returns a consolidated list of detected objects—then packages and deploys that pipeline to AWS so the app is publicly reachable.
On the product side, the app’s flow is straightforward: a landing page links to an image upload page where users select an image and submit the form. The client tracks a loading state, uploads the file to an API route, and then renders the returned results. After the server finishes detection, the UI shows (1) the uploaded image via its hosted URL and (2) a bold “detected objects” label. The detection output is not just raw model predictions; it’s aggregated into counts by object label. A confidence filter is applied (scores must exceed a chosen threshold, described around 0.85), and repeated labels are tallied into a single JSON object that becomes the displayed label (e.g., “dog: 2, bowl: 3”).
Under the hood, the server endpoint handles two key jobs. First, it uploads the incoming image using UploadThing, returning a URL. Second, it runs object detection using Hugging Face’s Transformers JS pipeline with an ONNX model. The implementation uses the Transformers JS object detection examples, then adapts them to accept the uploaded image URL. A practical Next.js compatibility issue appears during this step—an error about missing a loader for the file type—resolved by updating next.config.js with an images remotePatterns allowlist for the UploadThing-hosted domain.
Performance and reliability are addressed by caching the model locally. After fixing the Next.js loader configuration, the workflow triggers model download into the local Transformers cache (including an ONNX quantized model). Subsequent API calls reuse the cached model rather than pulling from Hugging Face again, making repeated detections faster.
Finally, the tutorial turns the app into an operational deployment. A multi-stage Dockerfile builds the Next.js app, injects required build-time environment variables for UploadThing, and sets up a runtime environment that preserves the model cache directory. The image is built locally, tested with Docker run, and then pushed to Amazon ECR (Elastic Container Registry). Deployment uses Amazon ECS (Elastic Container Service): a task definition points to the ECR image and exposes port 3000, a cluster and service are created, and the default security group is updated with an inbound rule allowing TCP traffic on port 3000 from anywhere. Once the service is running, the ECS task provides a public IP, and the app can be accessed externally; a final upload test confirms the end-to-end pipeline works.
The broader takeaway is that the same structure—upload, server-side model inference, result aggregation, and cloud deployment—can be reused as a springboard for other AI applications beyond object detection.
Cornell Notes
A Next.js app accepts an uploaded image, sends it to a server API route, uploads the file via UploadThing, and runs Hugging Face Transformers JS object detection using an ONNX model. The server returns the uploaded image URL plus a consolidated, confidence-filtered list of detected objects by counting repeated labels (e.g., multiple dogs or bowls). Next.js image rendering requires a next.config.js allowlist (remotePatterns) for the UploadThing-hosted domain to avoid loader errors. For inference speed, the Transformers model is cached locally so later requests reuse the downloaded ONNX quantized model instead of re-fetching it. The app is then containerized with a multi-stage Dockerfile, pushed to Amazon ECR, and deployed on Amazon ECS with port 3000 exposed publicly via security group rules.
How does the app turn raw object detection outputs into a user-friendly “detected objects” list?
Why are next.config.js changes necessary when displaying the uploaded image?
What role does local model caching play in the inference endpoint?
How does the deployment pipeline connect Docker, ECR, and ECS?
What specific infrastructure settings make the app accessible from the public internet?
Review Questions
- What data does the API route return to the client, and how does the client use it to render both the image and the detected object counts?
- Which two configuration steps are required to make inference and image rendering work reliably in Next.js (one for model inference, one for displaying remote images)?
- How do Docker multi-stage builds and ECS task definitions work together to ensure the same inference environment runs in AWS?
Key Points
- 1
The UI uploads an image and shows results only after the API returns an uploaded image URL plus a consolidated, confidence-filtered object-count label.
- 2
Object detection uses Hugging Face Transformers JS with an ONNX model, driven by the uploaded image URL and filtered by prediction score (around 0.85).
- 3
Detected labels are aggregated into counts (e.g., multiple dogs/bowls) rather than displayed as raw predictions.
- 4
Next.js requires next.config.js remotePatterns allowlisting for the UploadThing-hosted image domain to avoid loader errors.
- 5
Transformers JS model caching stores the ONNX quantized model locally so subsequent API calls avoid re-downloading from Hugging Face.
- 6
A multi-stage Dockerfile packages the Next.js app for production and preserves runtime directories needed for model caching.
- 7
Deployment uses Amazon ECR to store the image and Amazon ECS to run it, with port 3000 exposed publicly via an inbound security group rule.