How to count people in zones with YOLO26 and OpenCV
A practical walkthrough of a compact Python script that detects, tracks, and counts people inside polygon zones using Ultralytics YOLO26 and OpenCV.
If you've ever stood in a crowded elevator and wondered "how many people can actually fit in here?", you've already grasped the problem this little script solves. Counting people is easy when you only care about the whole frame. It gets interesting the moment you want to know how many people are standing in a specific region — a doorway, a queue, a parking bay, or, in this case, the stacked floor areas of an elevator.
In this post, I'll walk through a compact, production-flavored Python script that does exactly that: it detects people with Ultralytics YOLO, tracks them across frames, and counts how many fall inside each of several polygon zones. It's around 250 lines, it runs on a video file or a webcam, and it writes out an annotated video. Let's pull it apart.
At a high level, every frame goes through four moves:
-
Detect and track every person in the frame.
-
Decide which zone each person is standing in.
-
Tally the per-zone counts.
-
Draw the zones, boxes, and labels, then write the frame out.
The clever bits aren't in the detection — YOLO handles that for you. They're in the geometry: defining zones as polygons, cleaning those polygons so they behave, and deciding which zone a person belongs to when zones overlap. That's where most homegrown counters quietly go wrong, so that's where we'll spend our time.
Defining the zones
Zones are just lists of [x, y] points — polygons drawn over the camera's field of view. The elevator example stacks five of them, roughly one per "row" of standing space:
POLYGONS = [
# Zone # 01
[[192, 818], [243, 832], [1209, 1078], [1596, 1076],
[1601, 1047], [1524, 1029], [241, 694], [192, 815]],
# Zone # 02
[[274, 593], [336, 611], [1555, 978], [1627, 983],
[1634, 874], [380, 478], [321, 461]],
# ... three more
]
You'd normally generate these by pausing on a frame and clicking out the corners. They don't have to be rectangles — that's the whole point of using polygons. A real doorway or a slanted queue is rarely a tidy box.
The unglamorous hero: cleaning the polygons
Here's the part nobody warns you about. If you hand-click polygon points, you'll end up with two problems: near-duplicate points sitting almost on top of each other, and points in an order that makes the polygon's edges cross over themselves. Both of these quietly break OpenCV — a self-intersecting polygon fills wrong with fillPoly and gives nonsense results from pointPolygonTest.
The script fixes this once, up front:
@staticmethod
def _sanitize_polygon(points, min_dist=10):
pts = np.array(points, dtype=np.float32)
keep = []
for p in pts:
if not keep or np.linalg.norm(p - keep[-1]) > min_dist:
keep.append(p)
if len(keep) > 2 and np.linalg.norm(keep[0] - keep[-1]) <= min_dist:
keep.pop() # drop duplicated closing point
pts = np.array(keep, dtype=np.float32)
center = pts.mean(axis=0)
angles = np.arctan2(pts[:, 1] - center[1], pts[:, 0] - center[0])
return pts[np.argsort(angles)].astype(np.int32)
Two things happen here. First, it drops any point that sits within min_dist pixels of the previous one — that kills the duplicates. Second, and more importantly, it sorts the remaining points by their angle around the polygon's center. Sorting by angle guarantees the points go around the shape in order (clockwise or counter-clockwise), so the edges never cross. It's a small trick that turns messy hand-clicked coordinates into a well-behaved convex-ish polygon. If your zones are wildly concave, this re-ordering can over-simplify them, but for the rectangular-ish regions most counters use, it's exactly what you want.
Which zone is a person in?
Once the polygons are clean, deciding membership is a one-liner per zone, thanks to cv2.pointPolygonTest. But there's a subtlety: which point on a person do you test?
The script uses the bottom-center of the bounding box — ((x1 + x2) / 2, y2) — i.e., roughly where the feet are. That's the right call. If you tested the center of the box, a tall person leaning forward could "belong" to a zone whose feet are nowhere near. Feet-on-the-floor is how humans judge it, too.
def assign_zone(self, px, py):
best, best_d = -1, -1.0
for i, zone in enumerate(self.zones):
d = cv2.pointPolygonTest(zone, (float(px), float(py)), True)
if d >= 0 and d > best_d:
best, best_d = i, d
return best
Notice the second half. pointPolygonTest with True Returns the signed distance to the polygon edge — positive inside, negative outside. Instead of grabbing the first zone that returns positive, the code keeps the zone where the point is deepest inside. This is the tie-breaker for overlapping zones: a person standing where two zones touch gets assigned consistently to the one they're more firmly inside, frame after frame. That stability is what stops counts from flickering between neighbors.
And the counting step itself adds a small fallback — if the feet aren't inside any zone, it retries with the box center before giving up:
def count_zones(self, boxes):
counts = [0] * len(self.zones)
for (x1, y1, x2, y2, _) in boxes:
zi = self.assign_zone((x1 + x2) / 2, y2)
if zi < 0:
zi = self.assign_zone((x1 + x2) / 2, (y1 + y2) / 2)
if zi >= 0:
counts[zi] += 1
return counts
Detection and tracking
The detection layer is mercifully short because Ultralytics does the heavy lifting:
def detect_and_track(self, frame):
results = self.model.track(frame, persist=True, conf=self.conf,
classes=self.classes, imgsz=self.imgsz,
device=self.device, verbose=False)
boxes = []
r = results[0]
if r.boxes is not None:
for b in r.boxes:
x1, y1, x2, y2 = b.xyxy[0].tolist()
track_id = int(b.id[0]) if b.id is not None else -1
boxes.append((x1, y1, x2, y2, track_id))
return boxes
The key argument is persist=Truethat it tells the tracker to carry identities across frames, so each person keeps the same identity #id. The script passes classes=[0] from main(), which restricts detection to the "person" class and ignores everything else. There's also an frame_skip option that runs detection only every Nth frame and reuses the last boxes in between — a cheap way to claw back FPS on slower hardware at the cost of a little tracking lag.
Drawing it back onto the video
Good visualization is half the value of a counter — you need to see that it's working. The drawing routine blends translucent fills for each zone first, then lays crisp outlines on top so they don't get muddied by the blend:
overlay = frame.copy()
for zone in self.zones:
cv2.fillPoly(overlay, [zone], self.POLYS_COLOR)
img = cv2.addWeighted(overlay, self.FILL_ALPHA, frame, 1 - self.FILL_ALPHA, 0)
for zone in self.zones:
cv2.polylines(img, [zone], True, self.POLYS_COLOR, 4, cv2.LINE_AA)
The order matters. Filling on a copy and then alpha-blending gives you see-through zones; drawing the outlines after the blend keeps the borders sharp. Each box gets a small #id tag, and each zone gets a Zone N: count label anchored to its first point. The label drawing even clamps its position to stay on-screen, so a zone near the frame edge doesn't get its number chopped off.
Running it
The pipeline ties together in process() and run(). process() handles one frame end-to-end; run() opens the video, loops, writes the annotated output, and shows a live window you can quit with q:
def process(self, frame):
if self.frame_idx % self.frame_skip == 0:
self.last_boxes = self.detect_and_track(frame)
self.frame_idx += 1
counts = self.count_zones(self.last_boxes)
return counts, self.draw(frame, self.last_boxes, counts)
From the command line, it's just:
python zone_counter.py elevator.mp4
Pass a digit instead of a filename, and it treats it as a webcam index, so python zone_counter.py 0 runs off your default camera.
Complete code in one block
Here's the whole ZoneCounter script assembled end to end. Drop in your own zone polygons, point it at a video, and run.
"""
People Counting in Elevators using Ultralytics YOLO26.
USAGE: python zone_counter.py elevator.mp4
"""
import cv2
import sys
import numpy as np
from ultralytics import YOLO
# ZONE COORDINATES
POLYGONS = [
# Zone # 01
[[192, 818], [243, 832], [1209, 1078], [1596, 1076],
[1601, 1047], [1524, 1029], [241, 694], [192, 815]],
# Zone # 02
[[274, 593], [336, 611], [1555, 978], [1627, 983],
[1634, 874], [380, 478], [321, 461]],
# Zone # 03
[[351, 375], [419, 398], [1537, 791], [1655, 812],
[437, 360], [365, 345], [354, 373]],
# Zone # 04
[[256, 150], [1857, 779], [1915, 787], [1915, 623],
[511, 6], [337, 1]],
# Zone # 05
[[672, 0], [1872, 556], [1915, 575], [1916, 479],
[1020, 0], [674, 1]],
]
class ZoneCounter:
"""Counts people inside polygon zones using YOLO26 detection + tracking."""
# visualization settings
POLYS_COLOR = (0, 255, 255)
BBOX_COLOR = (255, 0, 255)
TEXT_COLOR = (104, 31, 17)
FILL_ALPHA = 0.35
ZONE_LABEL_SCALE, ZONE_LABEL_THICK = 2.4, 5
BOX_LABEL_SCALE, BOX_LABEL_THICK = 1.6, 4
WIN_NAME = "Zone Counter"
def __init__(self,
polygons,
model_path="yolo26n.pt",
classes=None,
conf=0.3,
imgsz=640,
device=None,
frame_skip=1):
"""
Set up zones, model and runtime options.
Args:
polygons: List of zone polygons, each a list of [x, y] points.
model_path: Path to the YOLO weights file.
classes: Class ids to detect (e.g. [0] for person). None = all.
conf: Minimum detection confidence.
imgsz: Inference image size.
device: Inference device ("cpu", "cuda", ...). None = auto.
frame_skip: Run detection every N-th frame, reuse boxes between.
"""
self.zones = [self._sanitize_polygon(p) for p in polygons]
self.anchors = [tuple(map(int, p[0])) for p in polygons]
self.model = YOLO(model_path)
self.classes = classes
self.conf = conf
self.imgsz = imgsz
self.device = device
self.frame_skip = max(1, frame_skip)
self.frame_idx = 0
self.last_boxes = []
# -----------
# clean zones
# -----------
@staticmethod
def _sanitize_polygon(points, min_dist=10):
"""
Clean up a polygon so it renders and tests correctly. Removes near-duplicate
vertices and re-orders the points by angle around the centroid, so edges never
cross each other. Crossing edges break both cv2.fillPoly and cv2.pointPolygonTest.
"""
pts = np.array(points, dtype=np.float32)
keep = []
for p in pts:
if not keep or np.linalg.norm(p - keep[-1]) > min_dist:
keep.append(p)
if len(keep) > 2 and np.linalg.norm(keep[0] - keep[-1]) <= min_dist:
keep.pop() # drop duplicated closing point
pts = np.array(keep, dtype=np.float32)
center = pts.mean(axis=0)
angles = np.arctan2(pts[:, 1] - center[1], pts[:, 0] - center[0])
return pts[np.argsort(angles)].astype(np.int32)
def assign_zone(self, px, py):
"""
Find which zone a point belongs to. Picks the zone the point lies deepest
inside, so overlapping zones are handled the same way every frame.
"""
best, best_d = -1, -1.0
for i, zone in enumerate(self.zones):
d = cv2.pointPolygonTest(zone, (float(px), float(py)), True)
if d >= 0 and d > best_d:
best, best_d = i, d
return best
# ---------
# detection
# ---------
def detect_and_track(self, frame):
"""Run YOLO26 tracking on one frame."""
results = self.model.track(frame, persist=True, conf=self.conf, classes=self.classes,
imgsz=self.imgsz, device=self.device, verbose=False)
boxes = []
r = results[0]
if r.boxes is not None:
for b in r.boxes:
x1, y1, x2, y2 = b.xyxy[0].tolist()
track_id = int(b.id[0]) if b.id is not None else -1
boxes.append((x1, y1, x2, y2, track_id))
return boxes
# --------
# counting
# --------
def count_zones(self, boxes):
"""Count boxes per zone using the bottom-center point of each box."""
counts = [0] * len(self.zones)
for (x1, y1, x2, y2, _) in boxes:
zi = self.assign_zone((x1 + x2) / 2, y2)
if zi < 0:
zi = self.assign_zone((x1 + x2) / 2, (y1 + y2) / 2)
if zi >= 0:
counts[zi] += 1
return counts
# -------
# drawing
# -------
def draw_zone_label(self, img, text, org):
"""Draw a zone count label with a filled background."""
(tw, th), _ = cv2.getTextSize(text, 0, self.ZONE_LABEL_SCALE, self.ZONE_LABEL_THICK)
h, w = img.shape[:2]
x = int(np.clip(org[0] - tw // 2, 5, w - tw - 5))
y = int(np.clip(org[1], th + 10, h - 10))
pad = int(12 * self.ZONE_LABEL_SCALE)
cv2.rectangle(img, (x - pad, y - th - pad), (x + tw + pad, y + pad), self.POLYS_COLOR, -1)
cv2.putText(img, text, (x, y), 0, self.ZONE_LABEL_SCALE, self.TEXT_COLOR, self.ZONE_LABEL_THICK)
def draw_box_label(self, img, text, x1, y1):
"""Draw a small label in the top-left corner of a bbox."""
(tw, th), base = cv2.getTextSize(text, 0, self.BOX_LABEL_SCALE, self.BOX_LABEL_THICK)
x1, y1 = int(x1), int(y1)
ty = y1 - 4
if ty - th - base < 0:
ty = y1 + th + 4
pad = int(4 * self.BOX_LABEL_SCALE)
cv2.rectangle(img, (x1 - pad, ty - th - 2 - pad), (x1 + tw + 6 + pad, ty + 2 + pad),
self.BBOX_COLOR, -1)
cv2.putText(img, text, (x1 + 3, ty + 1), 0, self.BOX_LABEL_SCALE,
self.TEXT_COLOR, self.BOX_LABEL_THICK)
def draw(self, frame, boxes, counts):
"""Draw zones, boxes and labels on a frame."""
# blend the zone fills first
overlay = frame.copy()
for zone in self.zones:
cv2.fillPoly(overlay, [zone], self.POLYS_COLOR)
img = cv2.addWeighted(overlay, self.FILL_ALPHA, frame, 1 - self.FILL_ALPHA, 0)
# crisp outlines AFTER blending so they stay sharp
for zone in self.zones:
cv2.polylines(img, [zone], True, self.POLYS_COLOR, 4, cv2.LINE_AA)
# detection boxes + track id labels
for (x1, y1, x2, y2, track_id) in boxes:
cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), self.BBOX_COLOR, 2)
if track_id >= 0:
self.draw_box_label(img, f"#{track_id}", x1, y1)
# zone count labels on the first point of each polygon
for i, anchor in enumerate(self.anchors):
self.draw_zone_label(img, f"Zone {i + 1}: {counts[i]}", anchor)
return img
# --------
# pipeline
# --------
def process(self, frame):
"""Run detection, counting and drawing on one frame."""
if self.frame_idx % self.frame_skip == 0:
self.last_boxes = self.detect_and_track(frame)
self.frame_idx += 1
counts = self.count_zones(self.last_boxes)
return counts, self.draw(frame, self.last_boxes, counts)
def run(self, source, output_path="output.mp4"):
"""Process a video source, show the result and save it."""
cap = cv2.VideoCapture(source)
if not cap.isOpened():
print(f"Could not open source: {source}")
return
# video writer setup
writer = None
if output_path:
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
writer = cv2.VideoWriter(output_path, fourcc, fps, (w, h))
if not writer.isOpened():
print(f"Could not open writer for: {output_path}")
writer = None
while True:
ret, frame = cap.read()
if not ret:
break
counts, annotated = self.process(frame)
if writer is not None:
writer.write(annotated)
cv2.imshow(self.WIN_NAME, annotated)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
if writer is not None:
writer.release()
print(f"Saved output video: {output_path}")
cv2.destroyAllWindows()
def main():
"""Parse the arguments and run the counter."""
if len(sys.argv) < 1:
print("Usage: python zone_counter.py <video.mp4>")
sys.exit(1)
source = sys.argv[1]
source = int(source) if source.isdigit() else source
counter = ZoneCounter(polygons=POLYGONS, model_path="elevator-yolo26m.pt", classes=[0])
counter.run(source)
if __name__ == "__main__":
main()
It's time to watch the Output 🚀
Where would you take it next?
This script is a clean foundation, and the obvious upgrades are easy to picture. You could log counts to a CSV with timestamps for later analysis, fire an alert when a zone exceeds a capacity threshold, or track entries and exits over time rather than just the live headcount. The polygon-cleaning and deepest-inside-wins logic would carry over unchanged — those are the genuinely reusable ideas here.
The real lesson isn't about elevators at all. It's that once you can reliably answer "is this point inside that shape, and if several shapes qualify, which one wins?", an enormous range of spatial-analytics problems becomes approachable. Queue monitoring, retail heatmaps, parking occupancy, crowd-safety dashboards — they're all variations on the same forty lines of geometry you just read.
FAQs
- Q:Do I need a GPU to run this?
- A:No. It runs on CPU, just slower. The device argument lets you point it at CUDA if you have it, and frame_skip helps recover speed on modest hardware.
- Q:Why polygons instead of simple rectangles?
- A:Because real-world regions are rarely axis-aligned boxes. Polygons let you trace a slanted doorway, a curved queue, or the trapezoidal floor rows of an elevator seen from an angled camera.
- Q:Why test the feet instead of the center of the person?
- A:Because a person occupies the floor at their feet. Testing the bounding-box center can place a leaning or partially occluded person in the wrong zone.
- Q:Can it count things other than people?
- A:Yes. The classes argument maps to YOLO's class list, so swap in the class IDs for cars, bikes, or anything the model knows, and the same zone logic applies.