Cyber Python 2077 - Using computer vision to read and walk from Cyberpunk 2077 map

TL;DR

Capture frames from a fixed screen region and crop the minimap using trial-and-error coordinates tuned to resolution and window layout.

Briefing Cornell Notes

Briefing

A practical computer-vision approach can drive on-foot navigation in Cyberpunk 2077 by reading the minimap, isolating the highlighted route, and steering the character to keep the route centered. The core idea is simple: grab the game screen, crop out the minimap region of interest, use OpenCV to mask the route color (yellow) in HSV space, then compute how far the detected route pixels drift from the minimap’s center. That “center error” becomes a control signal for mouse movement, while the W key handles forward motion.

The workflow starts with two technical requirements: fast screen capture and direct input. Instead of generic automation tools, the method relies on a screen-grab script to extract frames from a chosen region (tuned by trial and error for resolution and window layout) and a separate input controller that can send both keyboard presses and mouse movement in a way the game accepts. With frames flowing at a workable rate, the minimap is cropped from each frame using fixed pixel coordinates that the author adjusts for their setup.

From there, the route-following logic hinges on color segmentation. The minimap’s path is consistently marked in blue/white/yellow tones depending on the UI element, but the route itself is targeted by converting the minimap crop from BGR to RGB and then into HSV for masking. Because OpenCV’s HSV ranges don’t match common “color picker” expectations, the script uses iterative threshold tuning: it sweeps hue, saturation, and value ranges until the yellow path remains while most other minimap elements drop out. The result is a binary mask where matching pixels become 255 and everything else becomes 0.

Steering comes from geometry on that mask. All white (255) pixels are located with NumPy, producing coordinate lists (notably in y,x order). The script computes the mean x-position of the detected route pixels and compares it to the minimap’s horizontal center. The difference—negative when the route lies to one side and positive when it lies to the other—drives mouse movement direction. Forward motion is handled by holding W during the control loop; mouse adjustments rotate the character so the route drifts back toward center.

To avoid overreacting to distant parts of the route (which can cause the character to aim straight into obstacles), the method narrows the view further by using a “mini minimap” crop. This zoomed-in region contains only the near-term route dots ahead of the character, making turns more responsive. The author reports the system can follow turns and navigate around some objects, though it can fail when the mask loses the path (for example, due to transparency effects on the minimap) or when timing is off and the character overshoots.

The approach is intentionally rudimentary, but it demonstrates a complete loop—vision → route extraction → error computation → input control—that can achieve reliable on-foot movement for much of the map. The remaining gaps are practical: collision avoidance would likely require object detection (people, cars, obstacles), and some areas without sidewalks can force the path into the street, creating navigation problems. The next step proposed is layering object detection (e.g., TensorFlow-based) on top of the minimap steering so the agent can sidestep when pedestrians or other hazards appear.

Cornell Notes

The navigation system uses computer vision to follow Cyberpunk 2077’s minimap route while walking. Each loop grabs a frame, crops the minimap region, converts it to HSV, and masks the route color (yellow) to produce a binary image. NumPy finds the x-coordinates of the route pixels, computes their mean, and compares it to the minimap’s center to get an “error” value. Mouse movement is scaled from that error to steer the character, while the W key is held for forward motion. A tighter “mini minimap” crop improves turning by focusing on near-ahead route dots, though the method can break if the mask fails or if timing causes overshoot.

How does the system turn minimap pixels into steering commands?

After HSV masking, the route becomes white pixels (value 255) on a black background. The script uses NumPy to locate all pixels where mask == 255, yielding coordinates in (y, x) order. It then computes the mean of the x-values (mean x-position) and defines a target x as half the minimap width (minimap.shape[1] / 2). The error is target - mean_x. If the route is left of center, the mean_x shifts left and the error sign indicates which direction to rotate; the mouse movement uses negative 1 times the error to correct the heading.

Why does the method use HSV masking instead of direct RGB thresholding?

HSV makes it easier to isolate colors like the minimap’s yellow route despite lighting and UI variations. The author notes that OpenCV’s HSV ranges (0–255) don’t match typical color-picker ranges, so thresholds must be tuned by trial and error. They iteratively adjust hue and saturation/value ranges until the yellow path remains while other minimap elements (and noise) are filtered out.

What problem does the “mini minimap” crop solve?

Using the full minimap can cause the character to overcorrect toward distant route segments, sometimes aiming too directly and running into obstacles. By cropping to a smaller region ahead of the character (the “mini minimap”), the mask contains only near-term route dots. That makes the mean-x error reflect the immediate turn direction, producing smoother, more timely steering.

What are the main failure modes reported during testing?

Two common issues show up: (1) the character overshoots a turn when the control timing isn’t aligned with how quickly the route changes, leading to cases where the script loses the path and produces errors (e.g., NaNs when no route pixels are detected). (2) the minimap’s appearance can vary—transparency or UI blending can shift HSV values enough that the yellow mask no longer matches reliably, causing the route detection to drop out.

Why can’t generic mouse/keyboard automation be used directly?

The author emphasizes that input must match what the game recognizes. Tools like pyautogui are described as unsuitable for the game’s required input type. Instead, the control relies on a dedicated input module (from the GTA automation scripts) that can send both key presses (W for forward) and mouse movement in a way Cyberpunk 2077 accepts.

What upgrades are suggested to make navigation more robust?

The next logical improvement is object detection to avoid pedestrians and obstacles. The author proposes adding TensorFlow-style object detection so the agent can sidestep when people appear, then reacquire the route from the updated minimap. Another limitation is sidewalk availability: when the game draws the path in the middle of the street, the agent may struggle because there’s less safe space to follow.

Review Questions

How would you modify the error calculation if the minimap crop is rotated or significantly distorted beyond a simple x-center comparison?
What specific HSV thresholding steps would you take if the yellow route mask suddenly returns no pixels (all zeros) during a run?
How could you incorporate obstacle detection outputs into the steering logic without abandoning the minimap-based route-following approach?

Key Points

1
Capture frames from a fixed screen region and crop the minimap using trial-and-error coordinates tuned to resolution and window layout.
2
Isolate the route by converting the minimap crop to HSV and iteratively tuning hue/saturation/value thresholds until yellow path pixels remain.
3
Convert the binary route mask into steering by finding all route pixel x-positions, computing their mean, and comparing it to the minimap center.
4
Hold W for forward motion while using mouse movement scaled from the center error to rotate the character back toward the route.
5
Use a smaller “mini minimap” crop ahead of the character to improve turn timing and reduce overcorrection toward distant path segments.
6
Expect failures when the route mask drops out (e.g., minimap transparency/HSV drift) or when control timing causes overshoot and the agent loses the intended path.
7
Add object detection (people/cars/obstacles) to sidestep hazards and increase navigation success beyond minimap-only guidance.

Highlights

The steering signal is literally the horizontal offset of the masked route pixels from the minimap’s center; that offset becomes mouse movement direction.

A tighter “mini minimap” crop ahead of the character improves turning by focusing on near-term route dots rather than distant segments.

HSV threshold tuning is essential because OpenCV’s HSV ranges don’t match common color-picker assumptions, so the route mask must be calibrated by iteration.

Forward motion is handled separately from steering: W key for movement, mouse for heading correction.

The biggest practical limitations are mask dropouts and collisions—both motivating object detection as the next layer.

Topics

Computer Vision
OpenCV
HSV Masking
Minimap Navigation
Game Automation