Blog Event-Based Vision

Event-based vision
Spiking Neural Networks
6D Pose Estimation
Optimization Human Shape Reconstruction
Video prediction and Pose transfer
Representation learning
Others
Transformer

Event-based vision

research group: http://rpg.ifi.uzh.ch/research_dvs.html

resources: https://github.com/uzh-rpg/event-based_vision_resources

advantages

Event cameras have several advantages over traditional cameras: a latency in the order of microseconds, a very high dynamic range (140 dB compared to 60 dB of traditional cameras), and very low power consumption (10mW vs 1.5W of traditional cameras). Moreover, since all pixels capture light independently, such sensors do not suffer from motion blur.

[TPAMI 2020] Event-based Vision: A Survey [pdf]

Guillermo Gallego, Tobi Delbruck, Garrick Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Stefan Leutenegger, Andrew Davison, Joerg Conradt, Kostas Daniilidis, Davide Scaramuzza

[CVPR 2020] EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera [pdf]

Lan Xu, Weipeng Xu, Vladislav Golyanik, Marc Habermann, Lu Fang, Christian Theobalt

[ECCV 2020] Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks [pdf]

Chankyu Lee, Adarsh Kumar Kosta, Alex Zihao Zhu, Kenneth Chaney, Kostas Daniilidis, Kaushik Roy

SNN decodes event streams and ANN encodes optical flow. Simplest IF neuron model is used in this paper.
Self-supervised warping loss (optical flow between neighboring frames).

[CVPR 2020] Single Image Optical Flow Estimation With an Event Camera [pdf]

Liyuan Pan, Miaomiao Liu, Richard Hartley

[CVPR 2018] Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume [pdf] [code]

Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz

[CVPR 2020] Learning to Super Resolve Intensity Images From Events [pdf]

S. Mohammad Mostafavi I., Jonghyun Choi, Kuk-Jin Yoon

[CVPR 2020] Learning Event-Based Motion Deblurring [pdf]

Zhe Jiang, Yu Zhang, Dongqing Zou, Jimmy Ren, Jiancheng Lv, Yebin Liu

[CVPR 2020] Video to Events: Recycling Video Datasets for Event Cameras [pdf]

Daniel Gehrig, Mathias Gehrig, Javier Hidalgo-Carrio, Davide Scaramuzza

[CVPR 2020] 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos [pdf]

Aayush Bansal, Minh Vo, Yaser Sheikh, Deva Ramanan, Srinivasa Narasimhan

[CVPR 2020] Globally Optimal Contrast Maximisation for Event-Based Motion Estimation [pdf]

Daqi Liu, Alvaro Parra, Tat-Jun Chin

[CVPR 2020] Learning Visual Motion Segmentation Using Event Surfaces [pdf]

Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermuller, Yiannis Aloimonos

[CVPR 2020] EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning [pdf]

Lin Wang, Tae-Kyun Kim, and Kuk-Jin Yoon

[ICCV 2019] Learning an event sequence embedding for dense event-based deep stereo [pdf]

Stepan Tulyakov, Francois Fleuret, Martin Kiefel, Peter Gehler, Michael Hirsch

[ICCV 2019] End-to-End Learning of Representations for Asynchronous Event-Based Data [pdf]

Daniel Gehrig, Antonio Loquercio, Konstantinos G. Derpanis, Davide Scaramuzza

[CVPR 2019] Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks [pdf]

S. Mohammad Mostafavi I., Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon

[CVPR 2019] Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion [pdf]

Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis

[IROS 2018] Event-based Moving Object Detection and Tracking [pdf]

Anton Mitrokhin, Cornelia Fermuller, Chethan Parameshwara, Yiannis Aloimonos

[CVPR 2019] Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera [pdf]

Liyuan Pan, Cedric Scheerlinck, Xin Yu, Richard Hartley, Miaomiao Liu, Yuchao Dai

[CVPR 2019] Focus Is All You Need: Loss Functions For Event-based Vision [pdf]

Guillermo Gallego, Mathias Gehrig, Davide Scaramuzza

[CVPR 2019] Event Cameras, Contrast Maximization and Reward Functions: An Analysis [pdf]

Timo Stoffregen, Lindsay Kleeman

[TPAMI 2018] Event-based, 6-DOF Camera Tracking from Photometric Depth Maps [pdf]

Guillermo Gallego, Jon E.A. Lund, Elias Mueggler, Henri Rebecq, Tobi Delbruck, Davide Scaramuzza

[IJCV 2017, BMVC 2016] EMVS: Event-based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time [pdf]

Henri Rebecq, Guillermo Gallego, Elias Mueggler, Davide Scaramuzza

[ECCV 2018] Asynchronous, Photometric Feature Tracking using Events and Frames [pdf]

Daniel Gehrig, Henri Rebecq, Guillermo Gallego, Davide Scaramuzza

[CVPR 2018] A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation [pdf]

Guillermo Gallego, Henri Rebecq, Davide Scaramuzza

We present a unifying framework to solve several computer vision problems with event cameras: motion, depth and optical flow estimation. The main idea of our framework is to find the point trajectories on the image plane that are best aligned with the event data by maximizing an objective function: the contrast of an image of warped events. Our method implicitly handles data association between the events, and therefore, does not rely on additional appearance information about the scene. In addition to accurately recovering the motion parameters of the problem, our framework produces motion-corrected edge-like images with high dynamic range that can be used for further scene analysis.

[ECCV 2018] Semi-Dense 3D Reconstruction with a Stereo Event Camera [pdf]

Yi Zhou, Guillermo Gallego, Henri Rebecq, Laurent Kneip, Hongdong Li, Davide Scaramuzza

[CVPR 2019, TPAMI 2020] Events-to-Video: Bringing Modern Computer Vision to Event Cameras [pdf] [pdf]

Henri Rebecq, René Ranftl, Vladlen Koltun, Davide Scaramuzza

Since the output of event cameras is fundamentally different from conventional cameras, it is commonly accepted that they require the development of specialized algorithms to accommodate the particular nature of events.
We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data.

[ECCV 2018 workshop] Unsupervised event-based optical flow using motion compensation [pdf]

Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis

[Scientific Report] A Spiking Neural Network Model of Depth from Defocus for Event-based Neuromorphic Vision [pdf]

Germain Haessig, Xavier Berthelon, Sio-Hoi Ieng & Ryad Benosman

[CVPR 2017] A Low Power, Fully Event-Based Gesture Recognition System [pdf] [dataset]

Arnon Amir, Brian Taba, David Berg, Timothy Melano, Jeffrey McKinstry, Carmelo Di Nolfo, Tapan Nayak, Alexander Andreopoulos, Guillaume Garreau, Marcela Mendoza, Jeff Kusnitz, Michael Debole, Steve Esser, Tobi Delbruck, Myron Flickner, and Dharmendra Modha

[CVPR 2019] EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors [pdf]

Yanxiang Wang, Bowen Du, Yiran Shen, Kai Wu, Guangrong Zhao, Jianguo Sun, Hongkai Wen

[CVPR workshop 2019] DHP19: Dynamic Vision Sensor 3D Human Pose Dataset [pdf]

Enrico Calabrese, Gemma Taverni, Christopher Awai Easthope, Sophie Skriabine, Federico Corradi, Luca Longinotti, Kynan Eng, Tobi Delbruck

[RoSS 2018] EV-FlowNet: Self-supervised optical flow estimation for event-based cameras [pdf]

Alex Zihao Zhu, Liangzhe Yuan, Kenneth Chaney, Kostas Daniilidis

In particular, we introduce an image based representation of a given event stream, which is fed into a self-supervised neural network as the sole input. The corresponding grayscale images captured from the same camera at the same time as the events are then used as a supervisory signal to provide a loss function at training time, given the estimated flow from the network.

[CVPR 2016] Simultaneous Optical Flow and Intensity Estimation from an Event Camera [pdf]

Patrick Bardow, Andrew J. Davison, Stefan Leutenegger

[NNLS 2014] Event-Based Visual Flow [pdf] [code]

Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi

[back to top]

Spiking Neural Networks

[NeurIPS 2022] Online Training Through Time for Spiking Neural Networks [pdf]

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Di He, Zhouchen Lin

[NeurIPS 2022] Toward Robust Spiking Neural Network Against Adversarial Perturbation [pdf]

Ling Liang, Kaidi Xu, Xing Hu, Lei Deng, Yuan Xie

[NeurIPS 2022] Training Spiking Neural Networks with Local Tandem Learning [pdf]

Qu Yang, Jibin Wu, Malu Zhang, Yansong Chua, Xinchao Wang, Haizhou Li

[NeurIPS 2022] GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks [pdf]

Xingting Yao, Fanrong Li, Zitao Mo, Jian Cheng

[ICLR 2022] Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting [pdf]

Shikuang Deng, Yuhang Li, Shanghang Zhang, Shi Gu

[CVPR 2022] Event Transformer. A sparse-aware solution for efficient event data processing [pdf]

Alberto Sabater, Luis Montesano, Ana C. Murillo

Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT), that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU.

[CVPR 2022] Spiking Transformers for Event-based Single Object Tracking [pdf]

Jiqing Zhang, Bo Dong, Haiwei Zhang, Jianchuan Ding, Felix Heide, Baocai Yin, Xin Yang

Event-based cameras bring a unique capability to tracking, being able to function in challenging real-world conditions as a direct result of their high temporal resolution and high dynamic range. These imagers capture events asynchronously that encode rich temporal and spatial information. However, effectively extracting this information from events remains an open challenge. In this work, we propose a spiking transformer network, STNet, for single object tracking. STNet dynamically extracts and fuses information from both temporal and spatial domains. In particular, the proposed architecture features a transformer module to provide global spatial information and a spiking neural network (SNN) module for extracting temporal cues. The spiking threshold of the SNN module is dynamically adjusted based on the statistical cues of the spatial information, which we find essential in providing robust SNN features. We fuse both feature branches dynamically with a novel cross-domain attention fusion algorithm. Extensive experiments on two event-based datasets, FE240hz and EED, validate that the proposed STNet outperforms existing state-of-the-art methods in both tracking accuracy and speed with a significant margin.
The proposed SFE relies on a 3-layer LIF-based SNN [45] and a reduced Swin-transformer [33] to capture informative cues from the temporal and spatial domains.

[Neurips 2021] Deep Residual Learning in Spiking Neural Networks [pdf]

Wei Fang, Zhaofei Yu, Yanqi Chen, Tiejun Huang, Timothée Masquelier, Yonghong Tian

In this paper, we propose the spike element-wise (SEW) ResNet to realize residual learning in deep SNNs. We prove that the SEW ResNet can easily implement identity mapping and overcome the vanishing/exploding gradient problems of Spiking ResNet.

[ICCV 2019] Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks [pdf]

Wei Fang, Zhaofei Yu, Yanqi Chen, Timothee Masquelier, Tiejun Huang, Yonghong Tian

[Neurips 2021] Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks [pdf]

Yuhang Li, Yufei Guo, Shanghang Zhang, Shikuang Deng, Yongqing Hai, Shi Gu

[Neurips 2021] Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [pdf]

Mingqing Xiao, Qingyan Meng, Zongpeng Zhang, Yisen Wang, Zhouchen Lin

[Neurips 2020] Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks [pdf]

Jinseok Kim, Kyungsu Kim, Jae-Joon Kim

[Neurips 2020] Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks [pdf]

Wenrui Zhang, Peng Li

We present a novel Temporal Spike Sequence Learning Backpropagation (TSSL-BP) method for training deep SNNs, null- which breaks down error backpropagation across two types of inter-neuron and intra-neuron dependencies null- and leads to improved temporal learning precision.
Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity. null- Cambridge university press, 2002.

[Neurips 2021] Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks [pdf][code]

Jesse Hagenaars, Federico Paredes-Vallés, Guido de Croon

[Neurips 2021] Sparse Spiking Gradient Descent [pdf] [code]

Nicolas Perez-Nieves, Dan F.M. Goodman

Emre O. Neftci, Hesham Mostafa, and Friedemann Zenke. Surrogate gradient learning in null- spiking neural networks: Bringing the power of gradient-based optimization to spiking neural null- networks. IEEE Signal Processing Magazine, 36(6):51–63, 2019.
Friedemann Zenke and Tim P Vogels. The remarkable robustness of surrogate gradient learning null- for instilling complex function in spiking neural networks. Neural Computation, 33(4):899–925, 2021.

[CVPR 2021] Temporal-wise Attention Spiking Neural Networks for Event Streams Classification [pdf]

Man Yao, Huanhuan Gao, Guangshe Zhao, Dingheng Wang, Yihan Lin, Zhaoxu Yang, Guoqi Li

Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, and Yiannis Andreopoulos. Graph-based spatio-temporal null- feature learning for neuromorphic vision sensing. IEEE Transactions on Image Processing, 29:9084–9098, 2020.
Kaushik Roy, Akhilesh Jaiswal, and Priyadarshini Panda. Towards spike-based machine intelligence with neuromorphic null- computing. Nature, 575(7784):607–617, 2019.
Zhenzhi Wu, Hehui Zhang, Yihan Lin, Guoqi Li, Meng Wang, and Ye Tang. Liaf-net: Leaky integrate and analog fire null- network for lightweight and efficient spatiotemporal information processing. IEEE Transactions on Neural Networks null- and Learning Systems, pages 1–14, 2021.
Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer null- vision and pattern recognition, pages 7132–7141, 2018.
We use the STBP [38] and the BPTT algorithm [40] to train LIF-SNNs and LIAFSNNs, respectively.

[Neurips 2018] Gradient Descent for Spiking Neural Networks [pdf]

Dongsung Huh, Terrence J. Sejnowski

In general, the exact gradient of a dynamical system can be calculated using either Pontryagin’s minimum principle [33], also known as backpropagation through time, or real-time recurrent learning, which yield identical results.

[ECCV 2020] Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks [pdf]

Chankyu Lee, Adarsh Kumar Kosta, Alex Zihao Zhu, Kenneth Chaney, Kostas Daniilidis, Kaushik Roy

SNN decodes event streams and ANN encodes optical flow. Simplest IF neuron model is used in this paper.
Self-supervised warping loss (optical flow between neighboring frames).

[ArXiv 2018] Deep Learning in Spiking Neural Networks [pdf]

Amirhossein Tavanaei, Masoud Ghodrati, Saeed Reza Kheradpisheh, Timothee Masquelier, Anthony S. Maida

Fully Connected SNNs
Spiking CNN [173] [57][165][183]
ANN-to-SNN conversion

[IROS 2022] SpikeMS: Deep Spiking Neural Network for Motion Segmentation [pdf]

Chethan M. Parameshwara, Simin Li, Cornelia Fermüller, Nitin J. Sanket, Matthew S. Evanusa, Yiannis Aloimonos

[TPAMI 2019] Unsupervised Learning of a Hierarchical Spiking Neural Network for Optical Flow Estimation: From Events to Global Motion Perception [pdf]

Federico Paredes-Vallés, Kirk Y. W. Scheper, Guido C. H. E. de Croon

[Frontier in Neuroscience 2020] Enabling Spike-Based Backpropagation for Training Deep Neural Network Architectures [pdf]

Chankyu Lee, Syed Shakib Sarwar, Priyadarshini Panda, Gopalakrishnan Srinivasan, Kaushik Roy

[Frontier in Neuroscience 2020] Toward Scalable, Efficient, and Accurate Deep Spiking Neural Networks With Backward Residual Connections, Stochastic Softmax, and Hybridization [pdf]

Priyadarshini Panda, Aparna Aketi, Kaushik Roy

[Frontier in Neuroscience 2018] Spatio-temporal backpropagation for training high-performance spiking neural networks [pdf]

Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, Luping Shi

[arXiv 2019] Surrogate Gradient Learning in Spiking Neural Networks [pdf]

Emre O. Neftci, Hesham Mostafa, Friedemann Zenke

[ICLR 2018] Deep rewiring: Training very sparse deep networks [pd]

Guillaume Bellec, David Kappel, Wolfgang Maass, Robert Legenstein

[NeurIPS 2018] Long short-term memory and learning-to-learn in networks of spiking neurons [pdf]

Guillaume Bellec, Darjan Salaj, Anand Subramoney, Robert Legenstein, Wolfgang Maass

[NeurIPS 2018] Slayer: Spike layer error reassignment in time [pdf]

Sumit Bam Shrestha, Garrick Orchard

Time structure of the activity in neural network models [pdf]

Wulfram Gerstner

Spiking Response Model used in Slayer

[ICRA 2015] Spike time based unsupervised learning of receptive fields for event-driven vision [pdf]

Himanshu Akolkar, Stefano Panzeri and Chiara Bartolozzi

Convolutional networks for fast, energy-efficient neuromorphic computing [pdf]

Steven K. Esser, Paul A. Merolla, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy, Alexander Andreopoulos, David J. Berg, Jeffrey L. McKinstry, Timothy Melano, Davis R. Barch, Carmelo di Nolfo, Pallab Datta, Arnon Amir, Brian Taba, Myron D. Flickner, and Dharmendra S. Modha.

A wafer-scale neuromorphic hardware system for large-scale neural modeling [pdf]

Johannes Schemmel, Daniel Br¨uderle, Andreas Gr¨ubl, Matthias Hock, Karlheinz Meier, and Sebastian Millner

[IEEE NNLS 2018] A supervised learning algorithm for learning precise timing of multiple spikes in multilayer spiking neural networks [pdf]

Aboozar Taherkhani, Ammar Belatreche, Yuhua Li, Liam P. Maguire

[Frontier in Neuroscience 2017] Conversion of continuous-valued deep networks to efficient eventdriven networks for image classification [pdf]

Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer and Shih-Chii Liu

[Frontier in Neuroscience 2016] Training Deep Spiking Neural Networks using Backpropagation [pdf]

Jun Haeng Lee, Tobi Delbruck, Michael Pfeiffer

[Annual Review of Neuroscience 2008] Spike Timing–Dependent Plasticity: A Hebbian Learning Rule [pdf]

Natalia Caporale and Yang Dan

[back to top]

6D Pose Estimation

[CVPR 2020] Learning deep network for detecting 3D object keypoints and 6D poses [pdf]

Wanqing Zhao, Shaobo Zhang, Ziyu Guan, Wei Zhao

[CVPR 2019] Pvnet: Pixel-wise voting network for 6dof pose estimation [pdf][code]

Sida Peng, Yuan Liu, Qixing Huang, Hujun Bao, Xiaowei Zhou

[CVPR 2020] Pvn3d: A deep point-wise 3d keypointsvoting network for 6dof pose estimation [pdf]

Yisheng He, Wei Sun, Haibin Huang, Jianran Liu, Haoqiang Fan, Jian Sun

[ICCV 2019] Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation [pdf]

Kiru Park, Timothy Patten, Markus Vincze

[CVPR 2018] Real-time seamless single shot 6d object pose prediction [pdf]

Bugra Tekin, Sudipta N. Sinha, Pascal Fua

[ECCV 2018] DeepIM: Deep iterative matching for 6d pose estimation [pdf]

Yi Li, Gu Wang, Xiangyang Ji, Yu Xiang, Dieter Fox

[ECCV 2018 Best Paper Award] Implicit 3D Orientation Learning for 6D Object Detection from RGB Images [pdf]

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

[ICRA 2017] 6-dof object pose from semantic keypoints [pdf]

Georgios Pavlakos, Xiaowei Zhou, Aaron Chan, Konstantinos G. Derpanis, Kostas Daniilidis

[ICCV 2017] BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth [pdf]

Mahdi Rad, Vincent Lepetit

[back to top]

Optimization Human Shape Reconstruction

Background studying 1

quaternion \((w, x, y, z))\)

Background studying 2

TSDF [reference]

[CVPR 2018] DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor [pdf]

Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, Yebin Liu

[CVPR 2015] Dynamicfusion: Reconstruction and tracking of non-rigid scenes in real-time [pdf]

Richard Newcombe, Dieter Fox, Steve Seitz

[reference]

[ToG 2017] Real-time geometry, albedo and motion reconstruction using a single rgbd camera [pdf]

K. Guo, F. Xu, T. Yu, X. Liu, Q. Dai, and Y. Liu

[ICCV 2017] Bodyfusion: Real-time capture of human motion and surface geometry using a single depth camera [pdf]

T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai, and Y. Liu

[ECCV 2016] Volumedeform: Real-time volumetric nonrigid reconstruction [pdf]

M. Innmann, M. Zollhofer, M. Nießner, C. Theobalt, and M. Stamminger.

[ECCV 2018] ArticulatedFusion: Real-time Reconstruction of Motion, Geometry and Segmentation Using a Single Depth Camera [pdf]

Chao Li, Zheheng Zhao, and Xiaohu Guo

[CVPR 2014, TPAMI 2016] Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera [pdf] [pdf]

Mao Ye, Yang Shen, Chao Du, Zhigeng Pan, Ruigang Yang

[back to top]

Video Prediction and Pose Transfer

[NeurIPS 2016] Generating Videos with Scene Dynamics [pdf]

Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

[ICLR 2016] Deep multi-scale video prediction beyond mean square error [pdf]

Michael Mathieu, Camille Couprie, Yann LeCun

[ICCV 2017] The Pose Knows: Video Forecasting by Generating Pose Futures [pdf]

Jacob Walker, Kenneth Marino, Abhinav Gupta, Martial Hebert

[ECCV 2018] Deep Video Generation, Prediction and Completion of Human Action Sequences [pdf]

Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

[ECCV 2018] Pose Guided Human Video Generation [PDF]

Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, Dahua Lin

[NeurIPS 2018] Video Prediction via Selective Sampling [pdf]

Jingwei Xu, Bingbing Ni, Xiaokang Yang

[ICML 2018] Stochastic Video Generation with a Learned Prior [pdf]

Emily Denton, Rob Fergus

[ICLR 2018] Stochastic Variational Video Prediction [pdf]

Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine

[CVPR 2019] Learning to Film from Professional Human Motion Videos [pdf]

Chong Huang, Chuan-En Lin, Zhenyu Yang, Yan Kong, Peng Chen, Xin Yang, Kwang-Ting Cheng

[ICCV 2019] Improved Conditional VRNNs for Video Prediction [pdf]

Lluis Castrejon, Nicolas Ballas, Aaron Courville

[ICCV 2019] Compositional Video Prediction [pdf]

Yufei Ye, Maneesh Singh, Abhinav Gupta, Shubham Tulsiani

[NeurIPS 2019] High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks [pdf]

Ruben Villegas, Arkanath Pathak, Harini Kannan, Dumitru Erhan, Quoc V. Le, Honglak Lee

[ICML 2018] Hierarchical Long-term Video Prediction without Supervision [pdf]

Nevan Wichers, Ruben Villegas, Dumitru Erhan, Honglak Lee

visual analogy network (VAN), Deep visual analogy-making, NeurIPS 2015

[ICML 2017] Learning to Generate Long-term Future via Hierarchical Prediction [pdf]

Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee

[ECCV 2018] Flow-Grounded Spatial-Temporal Video Prediction from Still Images [pdf]

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, Ming-Hsuan Yang

[ICML 2020] Stochastic Latent Residual Video Prediction [pdf]

Jean-Yves Franceschi, Edouard Delasalles, Mickaël Chen, Sylvain Lamprier, Patrick Gallinari

[CVPR 2020] Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction [pdf]

Beibei Jin, Yu Hu, Qiankun Tang, Jingyu Niu, Zhiping Shi, Yinhe Han, Xiaowei Li

[CVPR 2020] G3AN: Disentangling Appearance and Motion for Video Generation [pdf]

Yaohui Wang, Piotr Bilinski, Francois Bremond, Antitza Dantcheva

[CVPR 2020] Future Video Synthesis With Object Motion Prediction [pdf]

Yue Wu, Rongrong Gao, Jaesik Park, Qifeng Chen

[CVPR 2020] Non-Adversarial Video Synthesis With Learned Priors [pdf]

Abhishek Aich, Akash Gupta, Rameswar Panda, Rakib Hyder, M. Salman Asif, Amit K. Roy-Chowdhury

[CVPR 2020] Probabilistic Video Prediction From Noisy Data With a Posterior Confidence [pdf]

Yunbo Wang, Jiajun Wu, Mingsheng Long, Joshua B. Tenenbaum

[CVPR 2020] Disentangling Physical Dynamics From Unknown Factors for Unsupervised Video Prediction [pdf]

Vincent Le Guen, Nicolas Thome

Pose Transfer

[NeurIPS 2019] Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction [pdf]

Yunji Kim, Seonghyeon Nam, In Cho, and Seon Joo Kim

[NeurIPS 2018] Unsupervised learning of object landmarks through conditional image generation [pdf]

[ICCV 2019] Everybody Dance Now [pdf]

Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros

[CVPR 2018] Synthesizing Images of Humans in Unseen Poses [pdf]

Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, John Guttag

[CVPR 2019] Animating Arbitrary Objects via Deep Motion Transfer [pdf]

Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe

[NeurIPS 2019] Few-shot Video-to-Video Synthesis [pdf]

Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

[CVPR 2019] Dense Intrinsic Appearance Flow for Human Pose Transfer [pdf]

Yining Li, Chen Huang, Chen Change Loy

[ICCV 2019] Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis [pdf]

Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao

The first body mesh recovery module will estimate the 3D mesh of \(I_s\) and \(I_r\), and render their correspondence maps, \(C_s\) and \(C_t\).
We calculate the transformation flow \(T \in \mathbb{R}^{H×W×2}\) by matching the correspondences between source correspondence map with its mesh face coordinates \(f_s\) and reference correspondence map. Here \(H×W\) is the size of image. Consequently, a front image \(I_{ft}\) and a masked background image \(I_{bg}\) are derived from masking the source image \(I_s\) based on \(C_s\). Finally, we warp the source image \(I_s\) by the transformation flow \(T\), and obtain the warped image \(I_{syn}\).
Correspondences: Each pixel of an IUV image refers to a body part index \(I\), and \((U, V)\) coordinates that map to a unique point on the body model surface.

[CVPR 2018] Unsupervised person image synthesis in arbitrary poses [pdf]

Albert Pumarola, Antonio Agudo, Alberto Sanfeliu, Francesc Moreno-Noguer

[ECCV 2018] Dense pose transfer [pdf]

Natalia Neverova, Rıza Alp G¨uler, and Iasonas Kokkinos

[NeurIPS 2017] Pose guided person image generation [pdf]

Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars and Luc Van Gool

[CVPR 2018] A variational u-net for conditional appearance and shape generation [pdf]

Patrick Esser, Ekaterina Sutter, and Bj¨orn Ommer

[CVPR 2018] Deformable gans for pose-based human image generation [pdf]

Aliaksandr Siarohin, Enver Sangineto, Stphane Lathuilire, and Nicu Sebe

[NeurIPS 2018] Soft-gated warping-gan for pose-guided person image synthesis [pdf]

Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, and Jian Yin

[NeurIPS 2015] Spatial transformer networks [pdf]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu

The combination of the localisation network, grid generator, and sampler form a spatial transformer.

[back to top]

Representation learning

[ArXiv 2020] A Simple Framework for Contrastive Learning of Visual Representations [pdf]

Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

[ICLR 2020] Contrastive Representation Distillation [pdf]

Yonglong Tian, Dilip Krishnan, Phillip Isola

Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a collection of models into a single estimator. Knowledge distillation, the standard approach to these problems, minimizes the KL divergence between the probabilistic outputs of a teacher and student network. We demonstrate that this objective ignores important structural knowledge of the teacher network. This motivates an alternative objective by which we train a student to capture significantly more information in the teacher’s representation of the data. We formulate this objective as contrastive learning. Experiments demonstrate that our resulting new objective outperforms knowledge distillation and other cutting-edge distillers on a variety of knowledge transfer tasks, including single model compression, ensemble distillation, and cross-modal transfer.
Motivation: Representational knowledge is structured – the dimensions exhibit complex interdependencies. The original KD objective introduced in (Hinton et al., 2015, Distilling the knowledge in a neural network) treats all dimensions as independent, conditioned on the input.

[ICLR 2020] Momentum Contrast for Unsupervised Visual Representation Learning [pdf]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick

From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. This suggests that the gap between unsupervised and supervised representation learning has been largely closed in many vision tasks.
The “keys” (tokens) in the dictionary are sampled from data (e.g., images or patches) and are represented by an encoder network. Unsupervised learning trains encoders to perform dictionary look-up: an encoded “query” should be similar to its matching key and dissimilar to others. Learning is formulated as minimizing a contrastive loss
contrastive learning: Dimensionality reduction by learning an invariant mapping, CVPR 2006

[ICLR 2019] Learning deep representations by mutual information estimation and maximization [pdf]

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Adam Trischler, and Yoshua Bengio

[ICCV 2019] Rethinking ImageNet pre-training [pdf]

Kaiming He, Ross Girshick, Piotr Dollár

[NeurIPS] Learning Representations by Maximizing Mutual Information Across Views [pdf]

Philip Bachman, R Devon Hjelm, William Buchwalter

[arXiv 2019] Data-Efficient Image Recognition with Contrastive Predictive Coding [pdf]

Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

[ICCV 2019] Local aggregation for unsupervised learning of visual embeddings [pdf]

Chengxu Zhuang, Alex Lin Zhai, Daniel Yamins

[CVPR 2019] Self-Supervised Representation Learning by Rotation Feature Decoupling [pdf]

Zeyu Feng, Chang Xu, Dacheng Tao

[CVPR 2018 spotlight] Unsupervised feature learning via non-parametric instance discrimination [pdf]

Zhirong Wu, Yuanjun Xiong, Stella Yu, and Dahua Lin

noise constative estimation (NCE) and proximal regularization: \(J_{NCE}(\theta) = -E_{P_d}\big[\log h(i, \mathbf{v}_i^{(t-1)}) - \lambda |\mathbf{v}_i^{(t-1)}-\mathbf{v}_i^{(t)}|2^2 \big] - m\cdot E{P_n}\big[\log (1-h(i, \mathbf{v'}^{(t-1)}))\big]\). \(P_d\) means data distribution and \(P_n\) means noise distribution (uniform).

[NeurIPS 2014] Discriminative unsupervised feature learning with convolutional neural networks [pdf]

Alexey Dosovitskiy, Philipp Fischer, Jost Tobias Springenberg, Martin Riedmiller, Thomas Brox

[arXiv 2018] Representation learning with contrastive predictive coding [pdf]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals

[back to top]

Others

[ICML 2020] Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data [pd]

Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson

[ICML 2020] Training Binary Neural Networks through Learning with Noisy Supervision [pdf]

Kai Han, Yunhe Wang, Yixing Xu, Chunjing Xu, Enhua Wu, Chang Xu

[ICML 2020] Learning Flat Latent Manifolds with VAEs [pd]

Nutan Chen, Alexej Klushyn, Francesco Ferroni, Justin Bayer, Patrick van der Smagt

[CVPR 2018] Neural 3D Mesh Renderer [pdf]

Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada

Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks.

[ICCV 2019] Deep Meta Metric Learning [pdf]

Guangyi Chen, Tianren Zhang, Jiwen Lu, Jie Zhou

[AAAI 2020] Hybrid Graph Neural Networks for Crowd Counting [pdf]

Ao Luo, Fan Yang, Xin Li, Dong Nie, Zhicheng Jiao, Shangchen Zhou, Hong Cheng

[ICLR 2020] Inductive Matrix Completion Based on Graph Neural Networks [pdf]

Muhan Zhang, Yixin Chen

[NeurIPS] Quaternion Knowledge Graph Embeddings [pdf]

Shuai Zhang, Yi Tay, Lina Yao, Qi Liu

[ICML 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [pdf]

Mingxing Tan, Quoc V. Le

[ICLR 2020] Geom-GCN: Geometric Graph Convolutional Networks [pdf]

Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, Bo Yang

Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs’ aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation.

[NeurIPS 2019] General E(2) - Equivariant Steerable CNNs [pdf]

Maurice Weiler, Gabriele Cesa

[CVPR 2020] DualConvMesh-Net: Joint Geodesic and Euclidean Convolutions on 3D Meshes [pdf]

Jonas Schult, Francis Engelmann, Theodora Kontogianni, Bastian Leibe

[SIGGRAPH 2020] CNNs on Surfaces using Rotation-Equivariant Features [pdf]

Ruben Wiersma, Elmar Eisemann, Klaus Hildebrandt

Accelerating 3D Deep Learning with PyTorch3D [pdf]

Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

On the Continuity of Rotation Representations in Neural Networks [pdf]

Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, Hao Li

[CVPR 2020] Adversarial Examples Improve Image Recognition [pdf]

Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

[NeurIPS 2020] A Closer Look at Accuracy vs. Robustness [pdf]

Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, Kamalika Chaudhuri

[ICML 2019] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [pdf]

Mingxing Tan, Quoc V. Le

[back to top]

Transformer

Spatially Sparse Convolution Library [link]

Submanifold Sparse Convolutional Networks [link]

[ICCV 2021] Voxel Transformer for 3D Object Detection [pdf]

Jiageng Mao, Yujing Xue, Minzhe Niu, Haoyue Bai, Jiashi Feng, Xiaodan Liang, Hang Xu, Chunjing Xu

we propose the sparse voxel module and the submanifold voxel module, which can operate on the empty and non-empty voxel positions effectively.

[ArXiv 2021] A Survey on Vision Transformer [pdf]

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao

[ICCV 2021] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [pdf]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo

Relative position bias

[CVPR 2021] Bottleneck Transformers for Visual Recognition [pdf]

Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

[ArXiv 2021] 3D Human Pose Estimation with Spatial and Temporal Transformers [pdf]

Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding

[NeurIPS 2021] Post-Training Quantization for Vision Transformer [pdf]

Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, Wen Gao

[CVPR 2020] LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention [pdf]

Junbo Yin, Jianbing Shen, Chenye Guan, Dingfu Zhou, Ruigang Yang

[ArXiv 2021] Mesh Graphormer [pdf]

Kevin Lin, Lijuan Wang, Zicheng Liu

[CVPR 2021] End-to-end human pose and mesh reconstruction with transformers [pdf]

Kevin Lin, Lijuan Wang, Zicheng Liu

[ECCV 2020] Non-autoregressive structured modeling for 3d hand pose estimation [pdf]

Lin Huang, Jianchao Tan, Ji Liu, and Junsong Yuan

[ArXiv 2021] CvT: Introducing Convolutions to Vision Transformers [pdf]

Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

[ICCV 2021] Point transformer [pdf]

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun

[CVPR 2021] End-to-end video instance segmentation with transformers [pdf]

Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

[ArXiv 2021] A Survey on Visual Transformer [pdf]

Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, Dacheng Tao

[AAAI 2021] Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting [pdf]

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, Wancai Zhang

[ECCV 2020] End-to-end object detection with transformers [pdf]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko

[CVPR 2020] Actor-Transformers for Group Activity Recognition [pdf]

Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, Cees G. M. Snoek

[ICLR 2021] An image is worth 16x16 words: Transformers for image recognition at scale [pdf]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

[ICML 2020] Generative pretraining from pixels [pdf]

Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever

[ArXiv 2020] Visual Transformers: Token-based Image Representation and Processing for Computer Vision [pdf]

Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Zhicheng Yan, Masayoshi Tomizuka, Joseph Gonzalez, Kurt Keutzer, Peter Vajda

semantic segmentation with transformer

[ICML 2018] Image transformer [pdf]

Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

[back to top]

Blog Event-Based Vision

Contents

Event-based vision

research group: http://rpg.ifi.uzh.ch/research_dvs.html

resources: https://github.com/uzh-rpg/event-based_vision_resources

advantages

[TPAMI 2020] Event-based Vision: A Survey [pdf]

[CVPR 2020] EventCap: Monocular 3D Capture of High-Speed Human Motions using an Event Camera [pdf]

[ECCV 2020] Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks [pdf]

[CVPR 2020] Single Image Optical Flow Estimation With an Event Camera [pdf]

[CVPR 2018] Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume [pdf] [code]

[CVPR 2020] Learning to Super Resolve Intensity Images From Events [pdf]

[CVPR 2020] Learning Event-Based Motion Deblurring [pdf]

[CVPR 2020] Video to Events: Recycling Video Datasets for Event Cameras [pdf]

[CVPR 2020] 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos [pdf]

[CVPR 2020] Globally Optimal Contrast Maximisation for Event-Based Motion Estimation [pdf]

[CVPR 2020] Learning Visual Motion Segmentation Using Event Surfaces [pdf]

[CVPR 2020] EventSR: From Asynchronous Events to Image Reconstruction, Restoration, and Super-Resolution via End-to-End Adversarial Learning [pdf]

[ICCV 2019] Learning an event sequence embedding for dense event-based deep stereo [pdf]

[ICCV 2019] End-to-End Learning of Representations for Asynchronous Event-Based Data [pdf]

[CVPR 2019] Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks [pdf]

[CVPR 2019] Unsupervised Event-Based Learning of Optical Flow, Depth, and Egomotion [pdf]

[IROS 2018] Event-based Moving Object Detection and Tracking [pdf]

[CVPR 2019] Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera [pdf]

[CVPR 2019] Focus Is All You Need: Loss Functions For Event-based Vision [pdf]

[CVPR 2019] Event Cameras, Contrast Maximization and Reward Functions: An Analysis [pdf]

[TPAMI 2018] Event-based, 6-DOF Camera Tracking from Photometric Depth Maps [pdf]

[IJCV 2017, BMVC 2016] EMVS: Event-based Multi-View Stereo - 3D Reconstruction with an Event Camera in Real-Time [pdf]

[ECCV 2018] Asynchronous, Photometric Feature Tracking using Events and Frames [pdf]

[CVPR 2018] A Unifying Contrast Maximization Framework for Event Cameras, with Applications to Motion, Depth and Optical Flow Estimation [pdf]

[ECCV 2018] Semi-Dense 3D Reconstruction with a Stereo Event Camera [pdf]

[CVPR 2019, TPAMI 2020] Events-to-Video: Bringing Modern Computer Vision to Event Cameras [pdf] [pdf]

[ECCV 2018 workshop] Unsupervised event-based optical flow using motion compensation [pdf]

[Scientific Report] A Spiking Neural Network Model of Depth from Defocus for Event-based Neuromorphic Vision [pdf]

[CVPR 2017] A Low Power, Fully Event-Based Gesture Recognition System [pdf] [dataset]

[CVPR 2019] EV-Gait: Event-Based Robust Gait Recognition Using Dynamic Vision Sensors [pdf]

[CVPR workshop 2019] DHP19: Dynamic Vision Sensor 3D Human Pose Dataset [pdf]

[RoSS 2018] EV-FlowNet: Self-supervised optical flow estimation for event-based cameras [pdf]

[CVPR 2016] Simultaneous Optical Flow and Intensity Estimation from an Event Camera [pdf]

[NNLS 2014] Event-Based Visual Flow [pdf] [code]

Spiking Neural Networks

[NeurIPS 2022] Online Training Through Time for Spiking Neural Networks [pdf]

[NeurIPS 2022] Toward Robust Spiking Neural Network Against Adversarial Perturbation [pdf]

[NeurIPS 2022] Training Spiking Neural Networks with Local Tandem Learning [pdf]

[NeurIPS 2022] GLIF: A Unified Gated Leaky Integrate-and-Fire Neuron for Spiking Neural Networks [pdf]

[ICLR 2022] Temporal Efficient Training of Spiking Neural Network via Gradient Re-weighting [pdf]

[CVPR 2022] Event Transformer. A sparse-aware solution for efficient event data processing [pdf]

[CVPR 2022] Spiking Transformers for Event-based Single Object Tracking [pdf]

[Neurips 2021] Deep Residual Learning in Spiking Neural Networks [pdf]

[ICCV 2019] Incorporating Learnable Membrane Time Constant to Enhance Learning of Spiking Neural Networks [pdf]

[Neurips 2021] Differentiable Spike: Rethinking Gradient-Descent for Training Spiking Neural Networks [pdf]

[Neurips 2021] Training Feedback Spiking Neural Networks by Implicit Differentiation on the Equilibrium State [pdf]

[Neurips 2020] Unifying Activation- and Timing-based Learning Rules for Spiking Neural Networks [pdf]

[Neurips 2020] Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural Networks [pdf]

[Neurips 2021] Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks [pdf][code]

[Neurips 2021] Sparse Spiking Gradient Descent [pdf] [code]

[CVPR 2021] Temporal-wise Attention Spiking Neural Networks for Event Streams Classification [pdf]

[Neurips 2018] Gradient Descent for Spiking Neural Networks [pdf]

[ECCV 2020] Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks [pdf]

[ArXiv 2018] Deep Learning in Spiking Neural Networks [pdf]

[IROS 2022] SpikeMS: Deep Spiking Neural Network for Motion Segmentation [pdf]

[TPAMI 2019] Unsupervised Learning of a Hierarchical Spiking Neural Network for Optical Flow Estimation: From Events to Global Motion Perception [pdf]

[Frontier in Neuroscience 2020] Enabling Spike-Based Backpropagation for Training Deep Neural Network Architectures [pdf]

[Frontier in Neuroscience 2020] Toward Scalable, Efficient, and Accurate Deep Spiking Neural Networks With Backward Residual Connections, Stochastic Softmax, and Hybridization [pdf]

[Frontier in Neuroscience 2018] Spatio-temporal backpropagation for training high-performance spiking neural networks [pdf]

[arXiv 2019] Surrogate Gradient Learning in Spiking Neural Networks [pdf]

[ICLR 2018] Deep rewiring: Training very sparse deep networks [pd]

[NeurIPS 2018] Long short-term memory and learning-to-learn in networks of spiking neurons [pdf]

[NeurIPS 2018] Slayer: Spike layer error reassignment in time [pdf]

Time structure of the activity in neural network models [pdf]

[ICRA 2015] Spike time based unsupervised learning of receptive fields for event-driven vision [pdf]

Convolutional networks for fast, energy-efficient neuromorphic computing [pdf]

A wafer-scale neuromorphic hardware system for large-scale neural modeling [pdf]

[IEEE NNLS 2018] A supervised learning algorithm for learning precise timing of multiple spikes in multilayer spiking neural networks [pdf]

[Frontier in Neuroscience 2017] Conversion of continuous-valued deep networks to efficient eventdriven networks for image classification [pdf]

[Frontier in Neuroscience 2016] Training Deep Spiking Neural Networks using Backpropagation [pdf]

[Annual Review of Neuroscience 2008] Spike Timing–Dependent Plasticity: A Hebbian Learning Rule [pdf]

6D Pose Estimation

[CVPR 2020] Learning deep network for detecting 3D object keypoints and 6D poses [pdf]

[CVPR 2019] Pvnet: Pixel-wise voting network for 6dof pose estimation [pdf][code]