使用Python实现Batch normalization和卷积层

昨天TX代码面试让手撸IOU,BN和CONV,当时一听真的是有点懵,其实IOU倒还好点,比较简单,但是BN,CONV实现就有点难了,当时也只是实现了BN的前向,卷积也只是写了一个一维卷积的公式而已。s今天趁有空顺便实现了一下IOU和BN的前向和反向。

IOU实现

iou的原理比较简单这里不予介绍。具体代码实现如下:

1
2
3
4
5
6
7
8
9
10
def IoU(boxA,boxB):#x1,y1,x2,y2
xA=max(boxA[0],boxB[0])
yA=max(boxA[1],boxB[1])
xB=min(boxA[2],boxB[2])
yB=min(boxA[3],boxB[3])
interArea=max(0,xB-xA+1)*max(0,yB-yA+1)
boxAArea=(boxA[2]-boxA[0]+1)*(boxA[3]-boxA[1]+1)
boxBArea=(boxB[2]-boxB[0]+1)*(boxB[3]-boxB[1]+1)
iou=interArea/(boxAArea+boxBArea-interArea)
return iou

用C++实现和上面是一样的,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
struct Rect
{
int x;int y;
int width;int height;
};

int IoU(Rect Box1, Rect Box2)
{
int XA = max(Box1.x, Box2.x);
int YA = max(Box1.y, Box2.y);
int XB = min(Box1.x + Box1.width, Box2.x + Box2.width);
int YB = min(Box1.y + Box1.height, Box2.y + Box2.height);
int interArea = max(0, XB - XA)*max(0, YB - YA);
int box1Area = Box1.width*Box1.height;
int box2Area = Box2.width*Box2.height;
int iou = float(interArea) / (box1Area + box2Area - interArea);
return iou;
}

使用Python实现BN

BN的前向操作只需要实现下面这些公式就行

对于反向传播的话,我们需要对x,r,β进行求导获得其梯度,然后进行反向传播。下面是原文给出的求导公式,原文给出的公式省略了一些中间步骤,下面我自己也推导了一下,如下图所示。


具体实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import numpy as np
def bn_forward(x,gamma,beta,eps=1e-5):#eps: 除以方差时为了防止方差太小而导致数值计算不稳定
#计算均值
sample_mean = np.mean(x, axis=0)
#计算方差
sample_var = np.var(x, axis=0)
#归一化
out_ = (x - sample_mean) / np.sqrt(sample_var + eps)
#缩放平移
out=gamma*out_+beta
#缓存中间变量用于反向传播计算
cache=(x,out_,gamma,beta,sample_mean,sample_var,eps)

return out,cache


def bn_backward(dout,cache):

N,D = dout.shape

x,out_,gamma,beta,sample_mean,sample_var,eps=cache

dbeta=np.sum(dout,axis=0)

dgamma=np.sum(out_*dout,axis=0)

dxhat=gamma*dout

dvar=np.sum(dxhat*out_,axis=0)*(-1/2.)*(1./sample_var+eps)

dmean=np.sum(dxhat,axis=0)/np.sqrt(sample_var + eps) + dvar*(-2)*sample_mean

dx=dxhat/np.sqrt(sample_var + eps) + dvar*2*(x-sample_mean)/N + dmean/N

return dx,dgamma,dbeta

使用Python实现卷积层

这道题我当时很懵,因为之前从来都没有去手动实现过,只是知道原理。因此面试的时候直接就放弃了。亡羊补牢,为时未晚,秋招才刚开始,为避免后续再出现这样的面试,还是自己手动实现以下比较好。原理都懂,就不写了,来点公式对于实现卷积会比较合适。

卷积层前向传播

卷积层的前向激活过程,我们首先忽略激活层。认为f(x)=x,那么纯卷积层的前向激活公式如下:

  • n是输入的个数,比如输入100张图片,n=100.
  • C是input channel,比如输入的图片是RGB三通道的,C=3.
  • S是stride,stride为1,逐行扫描。stride为2,隔一行扫描一次
  • XP是填0后的输入。若不填0,则XP=X.
  • F是filter number,系数的高和宽分别是HH,WW。Ho和Wo是输出的高,宽。

根据上面的公式可以写出卷积层前向传播的一个朴素的实现,在实现过程中,可以暂时不考虑for循环有多少层,后面可以优化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def conv_forward_naive(x,w,b,conv_param):

"""
Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.
Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
out = None
N, C, H, W=x.shape
F, _, HH, WW=w.shape
S,P = conv_param['stride'],conv_param['pad']
Ho =int( 1 + (H + 2 * P - HH) / S)
Wo = int(1 + (W + 2 * P - WW) / S)
out=np.zeros((N,F,Ho,Wo))

x_pad=np.zeros((N,C,H+2*P,W+2*P))
x_pad[:,:,P:P+H,P:P+W]=x


for i in range(Ho):
for j in range(Wo):
x_pad_mask=x_pad[:,:,i*S:i*S+HH,j*S:j*S+WW]
for f in range(F):
out[:,f,i,j]=np.sum(x_pad_mask*w[f,:,:,:],axis=(1,2,3))#逐通道相加

out = out + (b)[None, :, None, None]


cache=(x,w,b,conv_param)

return out,cache

前向传播还算是比较容易理解,实现起来不是特别难。

卷积层反向传播

反向传播求导很简单,和BN求导难度相比差远了,只是计算起来比较复杂,搞得有点懵。假设卷积层后反传的就是loss对于该层输出的梯度。那么,

代码实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def conv_backward_naive(dout,cache):
"""
Inputs:
- dout: Upstream derivatives.
- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive
Outputs:
- dx: Gradient with respect to x
- dw: Gradient with respect to w
- db: Gradient with respect to b
"""
dx, dw, db = None, None, None
N, F, H1, W1 = dout.shape
x, w, b, conv_param = cache
N, C, H, W = x.shape
HH = w.shape[2]
WW = w.shape[3]
S = conv_param['stride']
P = conv_param['pad']

dx, dw, db=np.zeros_like(x),np.zeros_like(w),np.zeros_like(b)

x_pad=np.zeros((N,C,H+2*P,W+2*P))
x_pad[:,:,P:P+H,P:P+W]=x
dx_pad=np.zeros_like(x_pad)

db=np.sum(dout,axis=(0,2,3))

for i in range(H1):
for j in range(W1):
x_pad_mask=x_pad[:,:,i*S:i*S+HH,j*S:j*S+WW]

for f in range(F):
dw[f,:,:,:]+=np.sum(x_pad_mask * (dout[:, f, i, j])[:, None, None, None], axis=0)

for n in range(N): #compute dx_pad
dx_pad[n, :,i*S:i*S+HH,j*S:j*S+WW] += np.sum((w[:, :, :, :] *(dout[n, :, i, j])[:,None ,None, None]), axis=0)

dx = dx_pad[:, :, P:P+H, P:P+W]
return dx, dw, db

最后

上面关于卷积层的实现主要参考了cs231n大作业,实现卷积只是大作业的一部分,这个大作业还包括实现max pooling,dropout等等,具体可以看这里