# DenseNet: Densely Connected CNN

2017-07-23
cwlseu

## 突出贡献

In this paper, we propose an architecture that distills this insight into a simple connectivity pattern: to ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent
layers. Crucially, in contrast to ResNets, we never combine features
through summation before they are passed into a layer; instead, we combine features by concatenating them.


## 稠密网关键技术

H()是一个composite function，是三个操作的组合

BN->ReLU->Conv(3x3)


DenseBlock特征输出->BN->Conv(1x1)->AvePooling(2x2)


## Growth rate

BN−>ReLU−>Conv(1×1)−>BN−>ReLU−>Conv(3×3)


## 效果

### 分类错误率

L表示网络深度，k为增长率。蓝色字体表示最优结果，+表示对原数据库进行data augmentation。可以发现DenseNet相比ResNet可以取得更低的错误率，并且使用了更少的参数。

1. CIFAR C10指的CIFAR-10, C100为CIFAR-100
2. SVHN. The Street View House Numbers (SVHN) dataset contains 32×32 colored digit images coming from Google Street View. The task is to classify the central digit into the correct one of the 10 digit classes. There are 73,257 images in the training set, 26,032 images in the test set, and 531,131 images for additional training.
3. ImageNet. The ILSVRC 2012 classification dataset consists 1.2 million images for training, and 50,000 for validation, and each image is associated with a label from 1000 predefined classes. 数据增益的方法 the images are first zero-padded with 4 pixels on each side, then randomly cropped to again produce 32×32 images;half of the images are then horizontally mirrored.

### DenseNet信息量分析

For each convolutional layer l within a block, we compute the average (absolute) weight assigned to connections with layer s. 上图 shows a heatmap for all three dense blocks.

The average absolute weight serves as a surrogate for the dependency of a convolutional layer on its preceding layers.

1. All layers spread their weights over many inputs within
the same block. This indicates that features extracted
by very early layers are, indeed, directly used by deep
layers throughout the same dense block.
2. The weights of the transition layers also spread their
weight across all layers within the preceding dense
block, indicating information flow from the first to the
last layers of the DenseNet through few indirections.
3. The layers within the second and third dense block
consistently assign the least weight to the outputs of
the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in
keeping with the strong results of DenseNet-BC where
exactly these outputs are compressed.
4. Although the final classification layer, shown on the
very right, also uses weights across the entire dense block, there seems to be a concentration towards final
feature-maps, suggesting that there may be some more
high-level features produced late in the network

1. 一个densenet block中的，靠前的层提取的特征，直接被后面的层使用。
2. transition layers中使用的特征，是来自densenetblock中的中的所有层的。
3. 第二和第三 block中第一行显示，上一个block输出的特征中有大量的冗余信息。因此Densenet-BC就是这么来的

DenseNet有如下优点：

• 有效解决梯度消失问题
• 强化特征传播
• 支持特征重用
• 大幅度减少参数数量

## 想法

1. 其实无论是ResNet还是DenseNet，核心的思想都是HighWay Nets的思想： 就是skip connection,对于某些的输入不加选择的让其进入之后的layer(skip)，从而实现信息流的整合，避免了信息在层间传递的丢失和梯度消失的问题(还抑制了某些噪声的产生).

2. 利用DenseNet block实现了将深度网络向着浅层但是很宽的网络方向发展。

|版权声明：本文为博主原创文章，未经博主允许不得转载。

Content