Wednesday, April 14, 2021


The goal of people counting is to estimate the number of people or the density of crowds in a monitored environment. Both the long-term and short-term statistics of people counts of an environment provide useful information for strategy planning or event detection. However, detecting or estimating the density of crowds is always a challenging task due to some potential difficulties, such as partial occlusions, low-quality images, clutter backgrounds, and so on. To this end, we focus on the framework where multiple cameras with different angles of view are available, and consider the visual cues captured by each camera as a knowledge source, carrying out cross-camera knowledge transfer to alleviate the difficulties.


Single-Camera People Counting

  • Cross-camera people counting
    • Applicable to various environments
    • Online training data acquisition and camera perspective estimation
  • Occlusion handling: Coupled Gaussian processes
    • First-pass Gaussian processes: Visible part
    • Second-pass Gaussian processes: Occluded part



Why Predict Conflict Works

  • An example: prediction conflict between horizontal and vertical gradients for occlusion handling

    • Legend
      • ---- Ground truth
      • ---- Prediction by the feature of horizontal gradients
      • ---- Prediction by the feature of vertical gradients



Blob Representation

  • We focus on foreground objects (pedestrians) in images
  • Background subtraction
  • Grouping spatially connected pixels



People Counting with Multiple Cameras

  • Why multiple cameras?
    • Complementary information
    • Dealing with resolution issues, occlusions, ...
  • Our approach
    • Ground plane matching + Visual knowledge transfer



Why Bob Matching?

  • Find the same groups of pedestrians across cameras
    • Synchronized frames
      • The numbers of people are not always equal, particularly when FOVs are quite different
    • We work on corresponding blob sets: blob clusters




  • Approximating the people counts in an image in two parts
    • Regular part: intra-camera visual features
    • Residual part: inter-camera visual knowledge
  • Formulate it as a transfer learning problem




Single-view Demo

Multi-view Demo




Visual Knowledge Transfer among Multiple Cameras for People Counting with Occlusion Handling

Ming-Fang Weng, Yen-Yu Lin, Nick C. Tang, and Hong-Yuan Mark Liao

ACM International Conference on Multimedia (MM), October 2012, (full paper)



Cross Camera People Counting with Perspective Estimation and Occlusion Handling

Tsung-Yi Lin, Yen-Yu Lin, Ming-Fang Weng, Yu-Chiang Wang, Yu-Feng Hsu, and Hong-Yuan Mark Liao

IEEE International Workshop on Information Forensics and Security (WIFS), November 2011