Computer-Aided Retinal Surgery using Data from the Video Compressed Stream

Zakarya Droueche, Gwénolé Quellec, Mathieu Lamard, Guy Cazuguel, Béatrice Cochener, Christian Roux


This paper introduces ongoing research on computer-aided ophthalmic surgery. We propose a Content-Based Video Retrieval (CBVR) system for surgeons decision aid: given the video stream captured by a digital camera monitoring a surgery, the system
retrieves similar annotated video streams in video archives. For comparing videos, we propose to characterize them by features extracted from compression data. First, motion vectors are extracted from the MPEG-4 AVC/H.264 compressed video stream.
Second, images are segmented into regions with homogeneous motion vectors, using region growing. Third, region displacements between consecutive frames are tracked, using the well-known Kalman filter, in order to extract features characterizing region
trajectories. Other features are also extracted from the residual information encoded in the MPEG-4 AVC/H.264 compressed video stream. This residual information consists of the difference between original input images and predicted images. Once features are
extracted, videos are compared using an extension of the fast dynamic time warping to multidimensional time series. In this paper, the system is applied to two medical datasets: a small dataset of 69 video-recorded retinal surgery steps and a dataset of 1,400
video-recorded cataract surgery steps. In order to assess its generality, the system is also applied to a large dataset of 1,707 movie clips with classified human actions. High retrieval scores are obtained on all the three datasets.

Full Text:



E. Stringa, and C. S. Regazzoni, “Real-time video-shot detection for scene surveillance applications,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 69-79, Jan 2000.

S. Dagtas, W. Al-Khatib, A. Ghafoor, and R. L. Kashyap, “Models for motion-based video indexing and retrieval,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 88-101, Jan 2000.

H. Greenspan, J. Goldberger, and A. Mayer, “Probabilistic space-time video modeling via piecewise GMM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 3, pp. 384-396, Mar 2004.

S. Giannarou, and G. Z. Yang, “Content-based surgical workflow representation using probabilistic motion modeling,” in LNCS Medical Imaging and Augmented Reality, vol. 6326, pp. 314-323, 2010.

Y. Cao, D. Liu, W. Tavanapong, J. Wong, J. Oh, and P. de Groen, “Computer-aided detection of diagnostic and therapeutic operations in colonoscopy videos,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 7, pp. 1268-1279, 2007.

A. M. Cano, F. Gaya, P. Lamata, P. Sanchez-Gonzalez, and E. J. Gomez, “Laparoscopic tool tracking method for augmented reality surgical applications,” in LNCS, vol. 5104, pp. 191-196, 2008.

S. Seshamani, W. Lau, and G. Hager, “Real-time endoscopic mosaicking,” Medical Image Computing and Computer, no. 9, pp. 355-363, 2006.

Cao Y , Liu D, Tavanapong W, Wong J, Oh J and de Groen P. C, Automatic classification of image with appendiceal orifice in colonoscopy videos, in Proceedings IEEE EMBC, New York, USA, 2006, pp. 2349-2352.

A. Noce, J. Triboulet, P. Poignet, Efficient tracking of the heart using texture. In IEEE International Conference of the Engineering in Medicine and Biology Society, Lyon, France, 2007, pp. 4480-4483.

W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, “Semanticbased surveillance video retrieval,” IEEE Transactions on Image Process, vol. 16, no. 4, pp. 1168-1181, 2007.

W. Hu, N. Xie, L. Li, X. Zeng, and S. Maybank, “A Survey on Visual Content-Based Video Indexing and Retrieval,” IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 41, no. 6, pp. 797-819, 2011.

K. Schoeffmann, F. Hopfgartner, O. Marques, L. Boeszoermenyi, and J. M. Jose, “Video browsing interfaces and applications: a review,” SPIE Reviews, vol. 1, no. 1, pp. 018004, 2010.

P. Yuan, B. Zhang, J. Li: Multi-modal Information Retrieval for Content-based Medical Image and Video Data Mining. in Proceedings of IMAGAPP, Roma, Italy, 2009, pp. 83-86.

Z. Droueche, M. Lamard, G. Cazuguel, G. Quellec, C. Roux and B. Cochener, “Content-Based Medical Video Retrieval Based on Region Motion Trajectories,” in Proceedings of IFMBE, 2012, vol. 37, Part 1, Part 6, pp. 622-625.

A. James, D. Vieira, B. Lo, A. Darzi, G.Z. Yang, “Eyegaze driven surgicalworkflow segmentation,” In: Ayache, N., Ourselin, S., Maeder, A. (eds.) MICCAI, Part II. LNCS, vol. 4792, pp. 110117, 2007.

H.264/AVC Reference Software JM 15.1 at https://

Video Coding Experts Group, Advanced video coding for generic audiovisual services, ITU-T Recommendation H.264, International Telecommunication Union, 2003.

Yue-Meng Chen, Ivan V. Bajic, “Predictive Decoding for Delay Reduction in Video Communications,” in Proceedings

of GLOBECOM, Whashington, USA, 2007, pp. 2053- 2057.

Li Zhao, Quan-li Chen, “Implementation of vehicle detection and tracking based on Kalman filter,” Electronic Measurement Technology, vol. 30, no 2, pp. 165-168, 2007.

I.-M. Pao and M.-T. Sun, “Modeling DCT coefficients for fast video encoding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, no. 4, pp. 608-616, June 1999.

H. Wang, and S. Kwong, “Hybrid model to detect zero quantized DCT coefficients in H.264,” IEEE Transactions on Multimedia, vol. 9, no. 4, pp. 728-735, June 2007.

Z. Droueche, M. Lamard, G. Cazuguel, C. Roux, and B. Cochener, “Lutilisation de linformation de mouvement pour la recherche des vidos mdicales par leur contenu,” Journée de

Recherche en Imagerie et Technologies de la Santé, rennes, France, 2011.

M. Lamard, G. Cazuguel, G. Quellec, L. Bekri, C.Roux, B. Cochener, “Content Based Image Retrieval based on Wavelet Transform coefficients distribution,” in Proceedings of the 29th Annual International Conference of the IEEE EMBS, Lyon, France, August, 2007, pp. 23-26.

M. Varanasi, and B. Aazhang, Parametricgeneralizedgaussian

density estimation,” Journal of the Acoustical Society of

America, vol. 86, no. 4, pp. 1404-1415, 1989.

Rubner Y (1999), “Perceptual Metrics for Image Database Navigation,” Ph.D. Thesis, Stanford University, USA, May 1999.

S. Park, W. Chu, J. Yoon, C. Hsu, “Fast Retrieval of Similar Subsequences of Different lengths in Sequence Databases,” in IEEE International Conference on Data Engineering

(ICDE), San Diego, San Diego, USA, 2000, pp. 2332.

F. L. Hitchcock, “The distribution of a product from several sources to numerous localities,” Journal of Mathematics and Physics, vol. 20, no. 2, pp. 224-230, 1941.

C. Ratanamahatana, E. Keogh, A. J. Bagnall, S. Lonardi, “A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering,” In Proceedings of PAKDD, Hanoi, Vietnam, May 2005, pp.771-777.

D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Kluwer Academic Publishers, Boston, 1989. MA.

W. H. Press, S. Teukolsky, W. Vetterling, B. Flannery, 1992b. Numerical Recipes in C : The Art of Scientific C omputing, Cambridge University Press, chapter 10, 1992. URL

Epiretinal Membrane. Available:


M. Marszaek, I. Laptev, and C. Schmid, “Actions in context,” in Proceedings of IEEE Conference Computer Vision Pattern Recognition, 2009, pp. 2929-2936


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.