使用Python实现池化层

还是延续上一篇博文,对神经网络中的池化层使用Python实现一遍,顺便了解一下在网络中前向传播和反向传播的计算过程。

池化层(average pooling & max pooling)

Pooling池化操作的反向梯度传播

以下文字内容摘自这里,侵权删。卷积神经网络中一个不可导的环节就是pooling 池化操作,因为pooling操作是的feature map的尺寸发生了变化,对于2*2的池化,假设第l+1层的feature map有16个梯度,那么第l层就会有64个梯度,这使得梯度无法对位的进行传播下去。解决这个问题的思想比较简单,就是把1个像素的梯度传递给4个像素,但是需要保证传递的loss(或者梯度)总和不变。根据这条原则,mean pooling和max pooling的反向传播也是不同的。

average pooling

average pooling的前向传播就是把一个patch中的值求取平均来做pooling,那么反向传播的过程也就是把某个元素的梯度等分为n份分配给前一层,这样就保证池化前后的梯度(残差)之和保持不变,如下图示:

max pooling

max pooling也要满足梯度之和不变的原则,max pooling的前向传播是把patch中最大的值传递给后一层,而其他像素的值直接被舍弃掉。那么反向传播也就是把梯度直接传给前一层某一个像素,而其他像素不接受梯度,也就是为0。所以max pooling操作和mean pooling操作不同点在于需要记录下池化操作时到底哪个像素的值是最大,也就是max id,这个变量就是记录最大值所在位置的,因为在反向传播中要用到,那么假设前向传播和反向传播的过程就如下图所示 :

Python实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import numpy as np
def average_pooling_forward(x,pool_param):
out=None
N,C,H,W=x.shape
HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out=(H-HH)/stride+1
W_out=(W-WW)/stride+1
out=np.zeros((N,C,H_out,W_out))

for i in H_out:
for j in W_out:
x_mask=x[:,:,i*stride:i*stride+HH,j*stride:j*stride+WW]
out[:,:,i,j]=np.average(x_mask,axis=(2,3))
cache=(x,pool_param)
return out,cache

def average_pooling_backward(dout,cache):
x,pool_param=cache
N,C,H,W=x.shape
HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out=(H-HH)/stride+1
W_out=(W-WW)/stride+1

dx=np.zeros_like(x)

for i in range(H_out):
for j in range(W_out):
dx[:,:,i*stride : i*stride+HH, j*stride : j*stride+WW]=dout[:,:,i,j]/(HH*WW)

return dx

def max_pooling_forward(x,pool_param):
out=None
N, C, H, W = x.shape
HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out = (H-HH)/stride+1
W_out = (W-WW)/stride+1
out = np.zeros((N,C,H_out,W_out))
for i in xrange(H_out):
for j in xrange(W_out):
x_mask=x[:,:,i*stride : i*stride+HH, j*stride : j*stride+WW]
out[:,:,i,j] = np.max(x_mask, axis=(2,3)) #中值池化的话,就是np.median

cache=(x,pool_param)
return out,cache

def max_pooling_backward(dout,cache):
x, pool_param = cache
N, C, H, W = x.shape
HH, WW, stride = pool_param['pool_height'], pool_param['pool_width'], pool_param['stride']
H_out = (H-HH)/stride+1
W_out = (W-WW)/stride+1
dx = np.zeros_like(x)

for i in xrange(H_out):
for j in xrange(W_out):
x_masked = x[:,:,i*stride : i*stride+HH, j*stride : j*stride+WW]
max_x_masked = np.max(x_masked,axis=(2,3))
temp_binary_mask = (x_masked == (max_x_masked)[:,:,None,None])
dx[:,:,i*stride : i*stride+HH, j*stride : j*stride+WW] += temp_binary_mask * (dout[:,:,i,j])[:,:,None,None]

return dx

最后

其实总的来说,对于池化层的反向传播只要保证总的loss(或梯度)保持不变这一原则就行,实现起来和卷积层相比要简单不少。