MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification

Philippe Burlina

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


Deep convolutional neural networks (DCNNs) perform on par or better than humans for image classification. Hence efforts have now shifted to more challenging tasks such as object detection and classification in images, video or RGBD. Recently developed region CNNs (R-CNN) such as Fast R-CNN [7] address this detection task for images. Instead, this paper is concerned with video and also focuses on resource-limited systems. Newly proposed methods accelerate R-CNN by sharing convolutional layers for proposal generation, location regression and labeling [12][13][19][25]. These approaches when applied to video are stateless: they process each image individually. This suggests an alternate route: to make R-CNN stateful and exploit temporal consistency. We extend Fast R-CNNs by making it employ recursive Bayesian filtering and perform proposal propagation and reuse. We couple multi-target proposal/detection tracking (MTT) with R-CNN and do detection-to-track association. We call this approach MRCNN as short for MTT + R-CNN. In MRCNN, region proposals that are vetted via classification and regression in R-CNNs - are treated as observations in MTT and propagated using assumed kinematics. Actual proposal generation (e.g. via Selective Search) need only be performed sporadically and/or periodically and is replaced at all other times by MTT proposal predictions. Preliminary results show that MRCNNs can economize on both proposal and classification computations, and can yield up to a 10 to 30 factor decrease in number of proposals generated, about one order of magnitude proposal computation time savings and nearly one order magnitude improvement in overall computational time savings, for comparable localization and classification performance. This method can additionally be beneficial for false alarm abatement.

Original languageEnglish (US)
Title of host publication2016 23rd International Conference on Pattern Recognition, ICPR 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781509048472
StatePublished - Apr 13 2017
Event23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico
Duration: Dec 4 2016Dec 8 2016


Other23rd International Conference on Pattern Recognition, ICPR 2016


  • ConvNets
  • Deep Learning
  • Fast R-CNN
  • Region CNNs in Video
  • Region Proposals

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'MRCNN: A stateful Fast R-CNN: Using temporal consistency in R-CNN for video object localization and classification'. Together they form a unique fingerprint.

Cite this