开始-感知-目标追踪

感知问题-目标追踪

此处提供了一个针对“感知问题-目标追踪任务”的上手示例，示例采用Bytetrack模型实现机动车与行人的二维目标跟踪。

以下给出了模型介绍与代码解读。

模型介绍

1. 模型概述：

Bytetrack是一种用于多目标跟踪（MOT, Multi Object Tracking）任务的算法，在论文《ByteTrack: Multi-Object Tracking by Associating Every Detection Box》中主要通过处理检测器产生的目标框来实现高效的多目标跟踪。它解决了传统MOT算法在处理低置信度目标时的问题，提高了跟踪精度。Bytetrack的核心思想是追踪所有检测框，包括高置信度和低置信度目标，这避免了漏掉低置信度的目标，从而提升整体跟踪性能。

在多目标跟踪任务中，输入为交通流视频，输出为被检测出的各个目标的标号ID。

2. 模型架构：

目标检测和分配：输入视频帧首先要经过目标检测器，得到检测框，这些检测框可以按照置信度进行排序。
初步关联：通过卡尔曼滤波器（Kalman Filter）和匈牙利算法（Hungarian Algorithm）进行初步的跟踪匹配。首先，高置信度的目标框要先与历史跟踪目标进行匹配，分配给现有的跟踪目标。
处理低置信度目标：Bytetrack没有简单丢弃低置信度目标框。对于没有匹配到的跟踪目标，算法进一步检查低置信度的目标，尝试与这些低置信度框进行匹配。这样可以捕捉到之前被检测器忽略的目标，提高跟踪的全面性。
状态更新：匹配成功的目标更新其状态，未匹配成功的目标则进入潜在目标列表。在后续帧中，潜在目标如果持续未被匹配，最终将被丢弃，被标记为丢失目标。

3. 具体实现：

数据预处理：将视频序列按照一定的帧率拆分成下标有序的图像序列，或者直接用opencv库进行读取。
选择模型：选择合适的目标跟踪模型，并提供相应的配置文件。
可视化：目标跟踪整体推理速度较快，所以可以做到实时显示，同时也可以离线进行处理，生成离线视频。

代码解读

0. 准备工作：

设备要求：Windows/Linux下均可运行，预装Anaconda/Miniconda；
需要配有Nvidia独立显卡、并安装Cuda、Cudnn、Pytorch。如果是Windows系统，建议在安装Cuda前，先安装Visual Studio。
采用的是Youtube上下载的某段Traffic Footage的视频截图，代码中已经给出demo文件夹，不需要额外下载。

1. 环境配置：

Copy to Clipboard

2. 数据预处理：

Copy to Clipboard

video_path：交通流视频文件的存放路径

3. 选择模型：

Copy to Clipboard

model_name：采用模型的名称，如果本地搜索不到，会联网自动下载，可能需要执行两次。

4. 结果可视化：

Copy to Clipboard

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

if success:
        # Run YOLO tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True)

# Get the boxes and track IDs
        boxes = results[0].boxes.xywh.cpu()

if results[0].boxes.id is None:
            track_ids = []
        else:
            track_ids = results[0].boxes.id.int().cpu().tolist()

# Visualize the results on the frame

annotated_frame = results[0].plot()

# Plot the tracks
        for box, track_id in zip(boxes, track_ids):
            x, y, w, h = box
            track = track_history[track_id]
            track.append((float(x), float(y)))  # x, y center point
            if len(track) > 30:  # retain 90 tracks for 90 frames
                track.pop(0)

# Draw the tracking lines
            points = np.hstack(track).astype(np.int32).reshape((-1, 1, 2))
            cv2.polylines(annotated_frame, [points], isClosed=False, color=(230, 230, 230), thickness=10)

# Display the annotated frame
        cv2.imshow("Object Tracking", annotated_frame)

videoWriter.write(annotated_frame)

# Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

window_name：可视化窗口名称
thickness：轨迹可视化的线条粗细度

完整DEMO

Copy to Clipboard

from collections import defaultdict

import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

import cv2
import numpy as np

from ultralytics import YOLO

# Load the YOLOv10 model
model = YOLO("yolov10b.pt")

# Open the video file
video_path = "demo/demo.mp4"
cap = cv2.VideoCapture(video_path)

# Store the track history
track_history = defaultdict(lambda: [])

fps = cap.get(cv2.CAP_PROP_FPS)
size = (
    int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)),
    int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
)

videoWriter = cv2.VideoWriter('output.mp4', cv2.VideoWriter.fourcc('m', 'p', '4', 'v'), fps, size)

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

if success:
        # Run Object tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True)

# Get the boxes and track IDs
        boxes = results[0].boxes.xywh.cpu()

if results[0].boxes.id is None:
            track_ids = []
        else:
            track_ids = results[0].boxes.id.int().cpu().tolist()

# Visualize the results on the frame

annotated_frame = results[0].plot()

# Display the annotated frame
        cv2.imshow("Object Tracking", annotated_frame)

videoWriter.write(annotated_frame)

# Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()
videoWriter.release()

Copy to Clipboard