机遇 research之路其修远兮,我将上下而求索






  1. Attention-guided Unified Network for Panoptic Segmentation
    • 作者:Yanwei Li, Xinze Chen, Zheng Zhu, Lingxi Xie, Guan Huang, Dalong Du, Xingang Wang
    • 论文链接:https://arxiv.org/abs/1812.03904
    • 摘要: This paper studies panoptic segmentation, a recently proposed task which segments foreground (FG) objects at the instance level as well as background (BG) contents at the semantic level. Existing methods mostly dealt with these two problems separately, but in this paper, we reveal the underlying relationship between them, in particular, FG objects provide complementary cues to assist BG understanding. Our approach, named the Attention-guided Unified Network (AUNet), is an unified framework with two branches for FG and BG segmentation simultaneously. Two sources of attentions are added to the BG branch, namely, RPN and FG segmentation mask to provide object-level and pixel-level attentions, respectively. Our approach is generalized to different backbones with consistent accuracy gain in both FG and BG segmentation, and also sets new stateof-the-arts in the MS-COCO (46.5% PQ) benchmarks.
    • 任务Panoptic Segment @Panoptic Segment任务介绍
  • 框架图 @基于Attention机制的Panoptic Segment算法框架

  • 小结

    • 任务新
    • 特征是FPN特征,使用两种Proposal Attention Module
    • 在Mask-RCNN中的基础上,对ROIAlign进行改进,丰富了不同抽象层次特征之间的交流
    • 后面是不是可以使用这个作文本生成?
  1. FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation
    • 作者:Paul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh Chen
    • 论文链接:https://arxiv.org/abs/1902.09513
  2. Associatively Segmenting Instances and Semantics in Point Clouds(点云分割,开源)
    • 作者:Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia
    • 论文链接:https://arxiv.org/abs/1902.09852
    • 代码链接:https://github.com/WXinlong/ASIS

Pose Estimation

  1. Deep High-Resolution Representation Learning for Human Pose Estimation(目前SOTA,已经开源)
    • 作者:Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang
    • 论文链接:
    • 代码链接:https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
    • Project: https://jingdongwang2017.github.io/Projects/HRNet/index.html
    • 任务:Human Pose Estimation
    • 摘要 In this paper, we are interested in the human pose estimation problem with a focus on learning reliable highresolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutliresolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich highresolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset.
    • 主要设计 @ @
  2. DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
    • 作者:Chen Wang, Danfei Xu, Yuke Zhu, Roberto Martín-Martín, Cewu Lu, Li Fei-Fei, Silvio Savarese
    • 论文链接:https://arxiv.org/abs/1901.04780
  3. Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training
    • 作者:Mahdi Abavisani, Hamid Reza Vaezi Joze, Vishal M. Patel — Microsoft
    • 链接:https://arxiv.org/abs/1812.06145
    • 摘要 We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a “spatiotemporal semantic alignment” loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed “focal regularization parameter” to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets
    • spatiotemporal semantic alignment (SSA) loss
    • dataset
dataset名称 size 描述
VIVA hand gestures 885 visible RGB and depth video sequences (RGB-D) of 19 hand gesture classes, collected from 8 subjects multimodal dynamic hand gesture dataset specifically designed with difficult settings of cluttered background, volatile illumination, and frequent occlusion for studying natural human activities in real-world driving settings. This dataset was captured using a Microsoft Kinect device
EgoGesture This dataset contains 24,161 hand gesture clips of 83 classes of gestures, performed by 50 subjects. Videos in this dataset include both static and dynamic gestures captured with an Intel RealSense SR300 device in RGB-D modalities across multiple indoor and outdoor scenes. a large multimodal hand gesture dataset collected for the task of egocentric gesture recognition.
NVGestures It contains 1532 dynamic hand gestures recorded from 20 subjects inside a car simulator with artificial lighting conditions. This dataset includes 25 classes of hand gestures. The gestures were recorded with SoftKinetic DS325 device as the RGB-D sensor and DUO-3D for the infrared streams. been captured with multiple sensors and from multiple viewpoints for studying human-computer interfaces
  1. 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
    • 作者:Ji Hou Angela Dai Matthias Nießner
    • 论文链接:https://niessnerlab.org/projects/hou20183dsis.html
    • YouTube视频:https://youtu.be/IH9rNLD1-JE
  2. RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
    • 作者:Bastian Wandt, Bodo Rosenhahn
    • 论文链接:https://arxiv.org/abs/1902.09868

Visual Question Answering & cross-multi-model data

  1. MUREL: Multimodal Relational Reasoning for Visual Question Answering
    • 作者:Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome
    • 论文链接:https://arxiv.org/abs/1902.09487
  2. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition
    • 作者:Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan
    • 论文链接:https://arxiv.org/abs/1902.09130
  3. End-to-End Multi-Task Learning with Attention
    • 作者:Shikun Liu, Edward Johns, Andrew J. Davison
    • 论文链接:https://arxiv.org/abs/1803.10704
  4. 24.Image-Question-Answer Synergistic Network for Visual Dialog
    • 作者:Dalu Guo, Chang Xu, Dacheng Tao
    • 论文链接:https://arxiv.org/abs/1902.09774
  5. Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    • 作者:Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang
    • 论文链接:https://arxiv.org/abs/1811.10092
  6. MUREL: Multimodal Relational Reasoning for Visual Question Answering
    • 作者:Remi Cadene, Hedi Ben-younes, Matthieu Cord, Nicolas Thome
    • 论文链接:https://arxiv.org/abs/1902.09487
    • github链接:https://github.com/Cadene/murel.bootstrap.pytorch
  7. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
    • 作者:Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis
    • 论文链接:https://arxiv.org/pdf/1812.00087.pdf


  1. SpherePHD: Applying CNNs on a Spherical PolyHeDron Representation of 360 degree Images
    • 作者:Yeon Kun Lee, Jaeseok Jeong, Jong Seob Yun, Cho Won June, Kuk-Jin Yoon
    • 论文链接:https://arxiv.org/abs/1811.08196
  2. The Perfect Match: 3D Point Cloud Matching with Smoothed Densities
    • 作者:Zan Gojcic, Caifa Zhou, Jan D. Wegner, Andreas Wieser
    • 论文链接:https://arxiv.org/abs/1811.06879
  3. Disentangled Representation Learning for 3D Face Shape
    • 作者:Baris Gecer, Stylianos Ploumpis, Irene Kotsia, Stefanos Zafeiriou
    • 论文链接:https://arxiv.org/abs/1902.05978
  4. Stereo R-CNN based 3D Object Detection for Autonomous Driving(3D检测)
    • 作者:Peiliang Li, Xiaozhi Chen, Shaojie Shen
    • 论文链接:https://arxiv.org/abs/1902.09738
  5. GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction
    • 作者:Baris Gecer, Stylianos Ploumpis, Irene Kotsia, Stefanos Zafeiriou
    • 论文链接:https://arxiv.org/abs/1902.05978
    • github链接:https://github.com/barisgecer/ganfit
  6. Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding(开源)
    • 作者:Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao
    • 论文链接:https://arxiv.org/abs/1902.09777
    • 代码链接:https://github.com/svip-lab/PlanarReconstruction
    • 来源:https://mp.weixin.qq.com/s/mamDhLUw6O9v8gldyIOPUA
  7. Disentangled Representation Learning for 3D Face Shape
    • 作者:Zi-Hang Jiang, Qianyi Wu, Keyu Chen, Juyong Zhang
    • 论文链接:https://arxiv.org/abs/1902.09887


  1. Event-based High Dynamic Range Image and Very High Frame Rate Video Generation using Conditional Generative Adversarial Networks
    • 作者:S. Mohammad Mostafavi I., Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon
    • 论文链接:https://arxiv.org/abs/1811.08230
  2. Mixture Density Generative Adversarial Networks
    • 作者:Hamid Eghbal-zadeh, Werner Zellinger, Gerhard Widmer
    • 论文链接:https://arxiv.org/abs/1811.00152


  1. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving
    • 作者:Yan Wang, Wei-Lun Chao, Divyansh Garg, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger
    • 论文链接:https://arxiv.org/abs/1812.07179


  1. ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape
    • 作者:Fabian Manhardt, Wadim Kehl, Adrien Gaidon
    • 论文链接:https://arxiv.org/abs/1812.02781


  1. Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
    • 作者:De-An Huang, Suraj Nair, Danfei Xu, Yuke Zhu, Animesh Garg, Li Fei-Fei, Silvio Savarese, Juan Carlos Niebles
    • 论文链接:https://arxiv.org/abs/1807.03480
  2. A Neurobiological Evaluation Metric for Neural Network Model Search
    • 作者:Nathaniel Blanchard, Jeffery Kinnison, Brandon RichardWebster, Pouya Bashivan, Walter J. Scheirer
    • 论文链接:https://arxiv.org/pdf/1805.10726.pdf
    • 一种Evaluation Metric
  3. Variational Bayesian Dropout
    • 作者:Yuhang Liu, Wenyong Dong, Lei Zhang, Dong Gong, Qinfeng Shi
    • 论文链接:https://arxiv.org/abs/1811.07533
  4. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression(检测)
    • 作者:Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese
    • 论文链接:https://arxiv.org/abs/1902.09630
    • 来源:https://mp.weixin.qq.com/s/mamDhLUw6O9v8gldyIOPUA
  5. Learning a Deep ConvNet for Multi-label Classification with Partial Labels(分类)
    • 作者:Thibaut Durand, Nazanin Mehrasa, Greg Mori
    • 论文链接:https://arxiv.org/abs/1902.09720
    • 来源:https://mp.weixin.qq.com/s/mamDhLUw6O9v8gldyIOPUA
  6. LiFF: Light Field Features in Scale and Depth
    • 作者:Donald G. Dansereau, Bernd Girod, Gordon Wetzstein
    • 论文链接:https://arxiv.org/abs/1901.03916
  7. Classification-Reconstruction Learning for Open-Set Recognition
    • 作者:Ryota Yoshihashi, Wen Shao, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura
    • 论文链接:https://arxiv.org/abs/1812.04246
  8. Weakly Supervised Deep Image Hashing through Tag Embeddings
    • 作者:Vijetha Gattupalli, Yaoxin Zhuo, Baoxin Li
    • 论文链接:https://arxiv.org/abs/1806.05804
    • 摘要 Many approaches to semantic image hashing have been formulated as supervised learning problems that utilize images and label information to learn the binary hash codes. However, large-scale labelled image data is expensive to obtain, thus imposing a restriction on the usage of such algorithms. On the other hand, unlabelled image data is abundant due to the existence of many Web image repositories. Such Web images may often come with images tags that contains useful information, although raw tags in general do not readily lead to semantic labels. Motivated by this scenario, we formulate the problem of semantic image hashing as a weakly-supervised learning problem. We utilize the information contained in the user-generated tags associated with the images to learn the hash codes. More specifically, we extract the word2vec semantic embeddings of the tags and use the information contained in them for constraining the learning. Accordingly, we name our model Weakly Supervised Deep Hashing using Tag Embeddings (WDHT). WDHT is tested for the task of semantic image retrieval and is compared against several state-of-art models. Results show that our approach sets a new state-of-art in the area of weekly supervised image hashing.
  9. InverseRenderNet: Learning single image inverse rendering
    • 作者:Ye Yu, William A. P. Smith
    • 论文链接:https://arxiv.org/abs/1811.12328
  10. Iterative Residual CNNs for Burst Photography Applications
    • 作者:Filippos Kokkinos   Stamatis Lefkimmiatis
    • 论文链接:https://arxiv.org/abs/1811.12197
  11. Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation
    • 作者:Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang
    • 论文链接:https://arxiv.org/abs/1809.09478


cvpr2019 accepted papers list:

链接:https://pan.baidu.com/s/1s4FuLscWcslN5rQQvP92JA 提取码:osvy



下一篇 高通芯片笔记