← Back to blog
YOLO26Object TrackingTrajectoryJune 25, 202610 min read

Object tracking and trajectory forecasting with YOLO26 and ByteTrack

Detect, track, and predict the future path of people and vehicles using Ultralytics YOLO26, ByteTrack, and a lightweight velocity-based forecasting model.

M
Muhammad Rizwan Munawar
Computer Vision Engineer · Founder, Rizwan AI

Knowing where an object is solves half the problem. Knowing where it's about to be solves the other half — that's the difference between a dashboard that reports the past and one that can warn you about the next second. In this guide we'll build a compact pipeline that detects and tracks people and vehicles with Ultralytics YOLO26, then forecasts each object's short-term path using nothing more than its recent motion.

The whole thing runs on a single video file, writes out an annotated result, and stays under ~200 lines. Let's walk through how it works.

How the pipeline fits together

Every frame flows through four stages:

  1. Detect and track every object of interest and keep its identity stable across frames.

  2. Smooth each track's center so the path doesn't jitter.

  3. Estimate velocity from the recent trail and project it into the future.

  4. Draw the past trail and the forecast, then write the frame out.

The detection and tracking are handled by YOLO26. The interesting part is the motion model — and the good news is you don't need a heavy learned predictor. A well-smoothed constant-velocity estimate carries you a long way for short horizons.

Step 1: Configure the pipeline

Everything tunable lives in a small block of constants at the top. These control which objects to track, how much history to keep, and how far ahead to forecast.

CONF = 0.4
TRACKER = "bytetrack.yaml"       # botsort.yaml or bytetrack.yaml
CLASSES = [0, 2, 3, 5, 7]        # person, car, motorcycle, bus, truck
HISTORY = 60                     # store last N centers per track
MIN_POINTS = 6                   # need at least this many points to forecast
FORECAST_STEPS = 35              # how many future points to project
VEL_WINDOW = 8                   # how many recent points to estimate velocity
EMA_ALPHA = 0.6                  # center smoothing (0..1); higher follows new more

CLASSES maps to YOLO's class IDs, so the same script tracks pedestrians, cars, or trucks just by editing this list. HISTORY caps how long a trail you keep per object, and FORECAST_STEPS sets how far into the future you project.

Step 2: Load YOLO26 and open the video

We load the model, move it to the GPU when one is available, and open the source video. We also read the frame rate up front — the forecast is expressed in real time, so it needs fps to convert pixel motion into a per-second velocity.

import cv2
import torch
import numpy as np
from collections import defaultdict, deque
from ultralytics import YOLO
from ultralytics.utils.plotting import colors

model = YOLO("yolo26s.pt")
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

cap = cv2.VideoCapture("videos/cars-on-highway.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
fps = float(fps) if fps and fps > 1e-6 else 30.0
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
writer = cv2.VideoWriter("results-track-forecast.mp4",
                         cv2.VideoWriter_fourcc(*"mp4v"), fps, (width, height))

Step 3: Track objects and smooth their centers

Inside the loop, model.track() with persist=True keeps identities stable across frames. For each detection we take the bounding-box center and pass it through an exponential moving average (EMA) before storing it. That smoothing matters: raw box centers wobble frame to frame, and a wobbly trail produces a noisy velocity estimate, which produces a wild forecast.

history = defaultdict(lambda: deque(maxlen=HISTORY))  # track_id -> recent centers
last_smooth = {}                                      # track_id -> smoothed (x, y)

results = model.track(frame, persist=True, conf=CONF, classes=CLASSES, tracker=TRACKER)
r0 = results[0]

if r0.boxes is not None and r0.boxes.id is not None:
    boxes = r0.boxes.xyxy.cpu().numpy()
    ids = r0.boxes.id.cpu().numpy().astype(int)
    clss = r0.boxes.cls.cpu().numpy().astype(int)

    for bbox, tid, cls in zip(boxes, ids, clss):
        x1, y1, x2, y2 = map(int, bbox)
        cx, cy = (x1 + x2) / 2.0, (y1 + y2) / 2.0

        # Exponential moving average on the center
        if tid in last_smooth:
            lx, ly = last_smooth[tid]
            sx = EMA_ALPHA * cx + (1.0 - EMA_ALPHA) * lx
            sy = EMA_ALPHA * cy + (1.0 - EMA_ALPHA) * ly
        else:
            sx, sy = cx, cy
        last_smooth[tid] = (sx, sy)
        history[tid].append((sx, sy))

EMA_ALPHA is the dial: closer to 1 follows the latest position aggressively (responsive but jittery), closer to 0 lags behind but stays smooth. 0.6 is a good middle ground for traffic.

Step 4: Estimate velocity and forecast the path

With a clean trail in hand, the forecast is two short functions. estimate_velocity looks at the last few points and takes the median per-axis speed — the median, not the mean, so one bad frame can't swing the prediction. forecast_points then walks that velocity forward, one time step at a time.

def estimate_velocity(pts, fps, window=8):
    n = len(pts)
    if n < 2:
        return 0.0, 0.0
    k = min(window, n - 1)
    vxs, vys, dt = [], [], 1.0 / fps
    for i in range(n - k, n):
        x1, y1 = pts[i - 1]
        x2, y2 = pts[i]
        vxs.append((x2 - x1) / dt)
        vys.append((y2 - y1) / dt)
    return float(np.median(vxs)), float(np.median(vys))

def forecast_points(last_xy, vx, vy, fps, steps):
    dt = 1.0 / fps
    lx, ly = float(last_xy[0]), float(last_xy[1])
    return [(lx + vx * dt * s, ly + vy * dt * s) for s in range(1, steps + 1)]

Two guards keep the forecast honest. We only predict once a track has at least MIN_POINTS of history, and we skip objects that are essentially stationary (a small speed gate), so parked cars don't sprout meaningless arrows.

if len(history[tid]) >= MIN_POINTS:
    vx, vy = estimate_velocity(list(history[tid]), fps, window=VEL_WINDOW)
    if np.hypot(vx, vy) > 1.0:  # stationary gate
        fpts = forecast_points(history[tid][-1], vx, vy, fps, FORECAST_STEPS)
        fpts = clamp_points(fpts, width, height)
        draw_forecast(frame, fpts, FORECAST_COLOR)

Step 5: Draw the trail and the forecast

Visualization is where it all becomes legible. We draw the past trail as a solid polyline in the object's color, and the forecast as a lighter polyline with a few dots marking the projected steps. clamp_points drops any predicted point that leaves the frame, so the arrow never shoots off into nowhere.

def draw_polyline(frame, pts, color, thickness=2):
    if len(pts) < 2:
        return
    p = np.array(pts, dtype=np.int32).reshape((-1, 1, 2))
    cv2.polylines(frame, [p], isClosed=False, color=color,
                  thickness=thickness, lineType=cv2.LINE_AA)

def draw_forecast(frame, pts, color):
    if len(pts) < 2:
        return
    draw_polyline(frame, pts, color, thickness=1)
    for (x, y) in pts[::max(1, len(pts) // 5)]:
        cv2.circle(frame, (int(x), int(y)), 4, color, -1, cv2.LINE_AA)

def clamp_points(pts, w, h):
    out = []
    for x, y in pts:
        if 0 <= x < w and 0 <= y < h:
            out.append((int(x), int(y)))
    return out

Step 6: Clean up lost tracks

When an object leaves the scene its ID stops appearing, but its history is still sitting in memory. At the end of each frame we drop any track that wasn't seen this frame, so the dictionaries don't grow without bound on long videos.

for tid in list(history.keys()):
    if tid not in active_ids:
        history.pop(tid, None)
        last_smooth.pop(tid, None)

Complete code in one block

# ======================================
# Ultralytics YOLO + Tracking + Forecast
# ======================================

import cv2
import torch
import numpy as np
from collections import defaultdict, deque
from ultralytics import YOLO
from ultralytics.utils.plotting import colors

# Config
CONF = 0.4
TRACKER = "bytetrack.yaml"       # botsort.yaml or bytetrack.yaml
CLASSES = [0, 2, 3, 5, 7]        # person, car, motorcycle, bus, truck
HISTORY = 60                     # store last N centers per track
MIN_POINTS = 6                   # need at least this many points to forecast
FORECAST_STEPS = 35              # how many future points
VEL_WINDOW = 8                   # how many recent points to estimate velocity
EMA_ALPHA = 0.6                  # smoothing for centers (0..1); higher = follow new more
FORECAST_COLOR = (108, 27, 255)
FONT_SCALE = 1.2
FONT_THICKNESS = 4
PADDING = 8


# Drawing helpers
def draw_polyline(frame, pts, color, thickness=2):
    if len(pts) < 2:
        return
    p = np.array(pts, dtype=np.int32).reshape((-1, 1, 2))
    cv2.polylines(frame, [p], isClosed=False, color=color, thickness=thickness, lineType=cv2.LINE_AA)

def draw_forecast(frame, pts, color):
    if len(pts) < 2:
        return
    draw_polyline(frame, pts, color, thickness=1)
    for (x, y) in pts[::max(1, len(pts) // 5)]:
        cv2.circle(frame, (int(x), int(y)), 4, color, -1, cv2.LINE_AA)

def clamp_points(pts, w, h):
    out = []
    for x, y in pts:
        if 0 <= x < w and 0 <= y < h:
            out.append((int(x), int(y)))
    return out


# Forecasting
def estimate_velocity(pts, fps, window=8):
    n = len(pts)
    if n < 2:
        return 0.0, 0.0
    k = min(window, n - 1)
    vxs, vys, dt = [], [], 1.0 / fps
    for i in range(n - k, n):
        x1, y1 = pts[i - 1]
        x2, y2 = pts[i]
        vxs.append((x2 - x1) / dt)
        vys.append((y2 - y1) / dt)
    return float(np.median(vxs)), float(np.median(vys))

def forecast_points(last_xy, vx, vy, fps, steps):
    dt = 1.0 / fps
    lx, ly = float(last_xy[0]), float(last_xy[1])
    return [(lx + vx * dt * s, ly + vy * dt * s) for s in range(1, steps + 1)]


# Inference
def inference(model_path, source, output_path, display_track_id_only=True):
    model = YOLO(model_path)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)

    cap = cv2.VideoCapture(source)
    if not cap.isOpened():
        raise FileNotFoundError(f"Could not open video source: {source}")

    fps = cap.get(cv2.CAP_PROP_FPS)
    fps = float(fps) if fps and fps > 1e-6 else 30.0
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    writer = cv2.VideoWriter(output_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (width, height))

    history = defaultdict(lambda: deque(maxlen=HISTORY))  # track_id -> deque[(x, y)]
    last_smooth = {}                                      # track_id -> (x, y)

    while True:
        ret, frame = cap.read()
        if not ret:
            break

        results = model.track(frame, persist=True, conf=CONF, classes=CLASSES, tracker=TRACKER)

        active_ids = set()
        r0 = results[0]

        if r0.boxes is not None and r0.boxes.id is not None:
            boxes = r0.boxes.xyxy.cpu().numpy()
            ids = r0.boxes.id.cpu().numpy().astype(int)
            clss = r0.boxes.cls.cpu().numpy().astype(int)

            for bbox, tid, cls in zip(boxes, ids, clss):
                x1, y1, x2, y2 = map(int, bbox)
                cx, cy = (x1 + x2) / 2.0, (y1 + y2) / 2.0
                active_ids.add(tid)

                # EMA smoothing center
                if tid in last_smooth:
                    lx, ly = last_smooth[tid]
                    sx, sy = EMA_ALPHA * cx + (1.0 - EMA_ALPHA) * lx, EMA_ALPHA * cy + (1.0 - EMA_ALPHA) * ly
                else:
                    sx, sy = cx, cy
                last_smooth[tid] = (sx, sy)
                history[tid].append((sx, sy))

                # Bounding box + label
                bbox_color = colors(0 if cls == 2 else cls, True)
                label = f"#{tid} {model.names[cls]}" if not display_track_id_only else f"{tid}"
                cv2.rectangle(frame, (x1, y1), (x2, y2), bbox_color, 2, cv2.LINE_AA)
                x, y = int(x1), int(y1)
                (tw, th) = cv2.getTextSize(label, 0, FONT_SCALE, FONT_THICKNESS)[0]
                rect_w, rect_h = tw + 2 * PADDING, th + 2 * PADDING
                x2, y2 = x + rect_w, y + rect_h
                cv2.rectangle(frame, (x1, y1), (x2, y2), bbox_color, -1)
                text_x, text_y = x1 + (rect_w - tw) // 2, y1 + (rect_h + th) // 2
                cv2.putText(frame, label, (text_x, text_y), 0, FONT_SCALE, (255, 255, 255), FONT_THICKNESS)

                # Past trail
                past_pts = clamp_points(list(history[tid]), width, height)
                draw_polyline(frame, past_pts, bbox_color, thickness=2)

                # Forecast
                if len(history[tid]) >= MIN_POINTS:
                    vx, vy = estimate_velocity(list(history[tid]), fps, window=VEL_WINDOW)
                    if np.hypot(vx, vy) > 1.0:  # stationary gate
                        fpts = forecast_points(history[tid][-1], vx, vy, fps, FORECAST_STEPS)
                        fpts = clamp_points(fpts, width, height)
                        draw_forecast(frame, fpts, FORECAST_COLOR)

        # Cleanup disappeared tracks
        for tid in list(history.keys()):
            if tid not in active_ids:
                history.pop(tid, None)
                last_smooth.pop(tid, None)

        cv2.imshow("Tracking + Forecast | q to quit", frame)
        writer.write(frame)

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break

    cap.release()
    writer.release()
    cv2.destroyAllWindows()


if __name__ == "__main__":
    inference(
        model_path="yolo26s.pt",
        source="videos/cars-on-highway.mp4",
        output_path="results-track-forecast.mp4",
        display_track_id_only=True,
    )

It's time to watch the Output 🚀

Real-world applications

  • Traffic safety: Flag vehicles whose forecast paths converge, or that drift toward a lane edge, a few frames before it happens.
  • Retail and crowd flow: Anticipate where shoppers are heading to measure intent and ease congestion at choke points.
  • Autonomous systems: Feed short-horizon forecasts into planning so robots and drones react to motion instead of just position.
  • Sports analytics: Project player and ball trajectories to study runs, passes, and spacing.

Where to take it next

The constant-velocity model is deliberately simple, and that's its strength — it's fast and has no training step. When you need more, the upgrade path is clear: swap the linear projection for a constant-acceleration or Kalman filter to handle curves, weight the velocity estimate toward the most recent points for quicker turns, or log each track's forecast error to tune VEL_WINDOW and EMA_ALPHA against your own footage. The tracking and drawing scaffolding stays exactly the same.

Explore more

Start forecasting motion with YOLO26 today! 🚀

FAQs

Q:What is the difference between object tracking and trajectory forecasting?
A:Tracking tells you where an object is now and keeps its identity consistent across frames. Forecasting uses that recent motion to project where the object will likely be in the next fraction of a second.
Q:How does the forecasting work without a trained motion model?
A:It estimates each track's velocity from the median of its recent center movements, then projects that velocity forward a fixed number of steps. It is a constant-velocity model, which is fast and surprisingly effective for short horizons.
Q:Should I use ByteTrack or BoT-SORT?
A:Both ship with Ultralytics. ByteTrack is lighter and very fast; BoT-SORT adds appearance and camera-motion compensation for tougher scenes. Swap the tracker by changing the TRACKER config to botsort.yaml.
Q:Can it forecast paths for people, not just vehicles?
A:Yes. The CLASSES list maps to YOLO's class IDs, so include 0 for people, or any other class the model detects. The same velocity-based forecast applies to every track.
Muhammad Rizwan Munawar

Computer Vision Engineer and top contributor to the YOLO project, building production AI and deep learning systems.