Build Next JS Full Stack AI app with Huggingface, Docker, AWS

TL;DR

The UI uploads an image and shows results only after the API returns an uploaded image URL plus a consolidated, confidence-filtered object-count label.

Briefing Cornell Notes

Briefing

The core outcome is a working full-stack Next.js AI app that lets users upload an image, sends it to a server endpoint, runs object detection with a Hugging Face ONNX model, and returns a consolidated list of detected objects—then packages and deploys that pipeline to AWS so the app is publicly reachable.

On the product side, the app’s flow is straightforward: a landing page links to an image upload page where users select an image and submit the form. The client tracks a loading state, uploads the file to an API route, and then renders the returned results. After the server finishes detection, the UI shows (1) the uploaded image via its hosted URL and (2) a bold “detected objects” label. The detection output is not just raw model predictions; it’s aggregated into counts by object label. A confidence filter is applied (scores must exceed a chosen threshold, described around 0.85), and repeated labels are tallied into a single JSON object that becomes the displayed label (e.g., “dog: 2, bowl: 3”).

Under the hood, the server endpoint handles two key jobs. First, it uploads the incoming image using UploadThing, returning a URL. Second, it runs object detection using Hugging Face’s Transformers JS pipeline with an ONNX model. The implementation uses the Transformers JS object detection examples, then adapts them to accept the uploaded image URL. A practical Next.js compatibility issue appears during this step—an error about missing a loader for the file type—resolved by updating next.config.js with an images remotePatterns allowlist for the UploadThing-hosted domain.

Performance and reliability are addressed by caching the model locally. After fixing the Next.js loader configuration, the workflow triggers model download into the local Transformers cache (including an ONNX quantized model). Subsequent API calls reuse the cached model rather than pulling from Hugging Face again, making repeated detections faster.

Finally, the tutorial turns the app into an operational deployment. A multi-stage Dockerfile builds the Next.js app, injects required build-time environment variables for UploadThing, and sets up a runtime environment that preserves the model cache directory. The image is built locally, tested with Docker run, and then pushed to Amazon ECR (Elastic Container Registry). Deployment uses Amazon ECS (Elastic Container Service): a task definition points to the ECR image and exposes port 3000, a cluster and service are created, and the default security group is updated with an inbound rule allowing TCP traffic on port 3000 from anywhere. Once the service is running, the ECS task provides a public IP, and the app can be accessed externally; a final upload test confirms the end-to-end pipeline works.

The broader takeaway is that the same structure—upload, server-side model inference, result aggregation, and cloud deployment—can be reused as a springboard for other AI applications beyond object detection.

Cornell Notes

A Next.js app accepts an uploaded image, sends it to a server API route, uploads the file via UploadThing, and runs Hugging Face Transformers JS object detection using an ONNX model. The server returns the uploaded image URL plus a consolidated, confidence-filtered list of detected objects by counting repeated labels (e.g., multiple dogs or bowls). Next.js image rendering requires a next.config.js allowlist (remotePatterns) for the UploadThing-hosted domain to avoid loader errors. For inference speed, the Transformers model is cached locally so later requests reuse the downloaded ONNX quantized model instead of re-fetching it. The app is then containerized with a multi-stage Dockerfile, pushed to Amazon ECR, and deployed on Amazon ECS with port 3000 exposed publicly via security group rules.

How does the app turn raw object detection outputs into a user-friendly “detected objects” list?

The server collects the Transformers JS object detection output, then filters predictions by confidence score (described as using a threshold around 0.85). It ignores bounding boxes for display and focuses on label + score. A counting dictionary is built: for each prediction with score above the threshold, the code checks whether that label already exists in the dictionary; if it does, the count increments, otherwise the label is added with an initial count of 1. The final counts are returned as a JSON-stringified label that the UI renders as bold text.

Why are next.config.js changes necessary when displaying the uploaded image?

After UploadThing returns a hosted URL, the Next.js <Image> component needs permission to load images from that external domain. Without configuration, the app hits an error about missing a loader for the file type/remote source. The fix is to update next.config.js with images.remotePatterns, specifying https and the UploadThing host name so Next.js will allow the image URL to be rendered.

What role does local model caching play in the inference endpoint?

Transformers JS downloads the ONNX model into a local cache directory (the transcript mentions a .cache folder under the Transformers/Genova area, including the ONNX quantized model). After the first request populates the cache, subsequent API calls reuse the locally stored model instead of pulling again from Hugging Face. This reduces latency and makes repeated detections faster and more reliable.

How does the deployment pipeline connect Docker, ECR, and ECS?

A multi-stage Dockerfile builds the Next.js app and prepares runtime directories for model caching. The resulting Docker image is tagged and pushed to Amazon ECR (Elastic Container Registry). Amazon ECS (Elastic Container Service) then creates a task definition that references the ECR repository image URL and exposes port 3000. An ECS service runs the task, and a security group inbound rule allows TCP traffic on port 3000 so the app becomes publicly reachable via the task’s public IP.

What specific infrastructure settings make the app accessible from the public internet?

The ECS task definition exposes the container port 3000 using TCP. The ECS service uses a security group that is updated with an inbound rule: custom TCP on port 3000, allowing Anywhere IPv4. After that, the ECS task shows a public IP; visiting that IP with port 3000 confirms the app is live and the upload/detection flow works end-to-end.

Review Questions

What data does the API route return to the client, and how does the client use it to render both the image and the detected object counts?
Which two configuration steps are required to make inference and image rendering work reliably in Next.js (one for model inference, one for displaying remote images)?
How do Docker multi-stage builds and ECS task definitions work together to ensure the same inference environment runs in AWS?

Key Points

1
The UI uploads an image and shows results only after the API returns an uploaded image URL plus a consolidated, confidence-filtered object-count label.
2
Object detection uses Hugging Face Transformers JS with an ONNX model, driven by the uploaded image URL and filtered by prediction score (around 0.85).
3
Detected labels are aggregated into counts (e.g., multiple dogs/bowls) rather than displayed as raw predictions.
4
Next.js requires next.config.js remotePatterns allowlisting for the UploadThing-hosted image domain to avoid loader errors.
5
Transformers JS model caching stores the ONNX quantized model locally so subsequent API calls avoid re-downloading from Hugging Face.
6
A multi-stage Dockerfile packages the Next.js app for production and preserves runtime directories needed for model caching.
7
Deployment uses Amazon ECR to store the image and Amazon ECS to run it, with port 3000 exposed publicly via an inbound security group rule.

Highlights

The app doesn’t just list detections—it counts repeated labels after applying a confidence threshold, producing a compact “detected objects” summary.

A Next.js remote image allowlist (images.remotePatterns) is essential when rendering UploadThing-hosted URLs with the <Image> component.

Local caching of the Hugging Face ONNX model (including the quantized variant) makes repeated inference faster by reusing the downloaded model.

The deployment chain is end-to-end: Docker build → push to Amazon ECR → ECS task definition/service → public access via port 3000 security group rules.

Topics

Object Detection
Next.js
Hugging Face Transformers
UploadThing
Docker
Amazon ECR
Amazon ECS

Mentioned

ONNX
ECR
ECS
TCP
JSON
UI
API