使用Python实现Batch normalization和卷积层

昨天TX代码面试让手撸IOU,BN和CONV，当时一听真的是有点懵，其实IOU倒还好点，比较简单，但是BN,CONV实现就有点难了，当时也只是实现了BN的前向，卷积也只是写了一个一维卷积的公式而已。s今天趁有空顺便实现了一下IOU和BN的前向和反向。

IOU实现

iou的原理比较简单这里不予介绍。具体代码实现如下：

def IoU(boxA,boxB):#x1,y1,x2,y2
    xA=max(boxA[0],boxB[0])
    yA=max(boxA[1],boxB[1])
    xB=min(boxA[2],boxB[2])
    yB=min(boxA[3],boxB[3])
    interArea=max(0,xB-xA+1)*max(0,yB-yA+1)
    boxAArea=(boxA[2]-boxA[0]+1)*(boxA[3]-boxA[1]+1)
    boxBArea=(boxB[2]-boxB[0]+1)*(boxB[3]-boxB[1]+1)
    iou=interArea/(boxAArea+boxBArea-interArea)
    return iou

用C++实现和上面是一样的，如下：

struct Rect
{
	int x;int y;
	int width;int height;
};

int IoU(Rect Box1, Rect Box2)
{
	int XA = max(Box1.x, Box2.x);
	int YA = max(Box1.y, Box2.y);
	int XB = min(Box1.x + Box1.width, Box2.x + Box2.width);
	int YB = min(Box1.y + Box1.height, Box2.y + Box2.height);
	int interArea = max(0, XB - XA)*max(0, YB - YA);
	int box1Area = Box1.width*Box1.height;
	int box2Area = Box2.width*Box2.height;
	int iou = float(interArea) / (box1Area + box2Area - interArea);
	return iou;
}

使用Python实现BN

BN的前向操作只需要实现下面这些公式就行

对于反向传播的话，我们需要对x,r,β进行求导获得其梯度，然后进行反向传播。下面是原文给出的求导公式，原文给出的公式省略了一些中间步骤，下面我自己也推导了一下，如下图所示。

具体实现如下：

import numpy as np
def bn_forward(x,gamma,beta,eps=1e-5):#eps: 除以方差时为了防止方差太小而导致数值计算不稳定
    #计算均值
    sample_mean = np.mean(x, axis=0)
    #计算方差
    sample_var = np.var(x, axis=0)
    #归一化
    out_ = (x - sample_mean) / np.sqrt(sample_var + eps)
    #缩放平移
    out=gamma*out_+beta
    #缓存中间变量用于反向传播计算
    cache=(x,out_,gamma,beta,sample_mean,sample_var,eps)
    
    return out,cache

    
def bn_backward(dout,cache):
    
    N,D = dout.shape
    
    x,out_,gamma,beta,sample_mean,sample_var,eps=cache
    
    dbeta=np.sum(dout,axis=0)
    
    dgamma=np.sum(out_*dout,axis=0)
    
    dxhat=gamma*dout
    
    dvar=np.sum(dxhat*out_,axis=0)*(-1/2.)*(1./sample_var+eps)
    
    dmean=np.sum(dxhat,axis=0)/np.sqrt(sample_var + eps) + dvar*(-2)*sample_mean
    
    dx=dxhat/np.sqrt(sample_var + eps) + dvar*2*(x-sample_mean)/N + dmean/N
    
    return dx,dgamma,dbeta

使用Python实现卷积层

这道题我当时很懵，因为之前从来都没有去手动实现过，只是知道原理。因此面试的时候直接就放弃了。亡羊补牢，为时未晚，秋招才刚开始，为避免后续再出现这样的面试，还是自己手动实现以下比较好。原理都懂，就不写了，来点公式对于实现卷积会比较合适。

卷积层前向传播

卷积层的前向激活过程，我们首先忽略激活层。认为f(x)=x，那么纯卷积层的前向激活公式如下：

n是输入的个数，比如输入100张图片，n＝100.

C是input channel，比如输入的图片是RGB三通道的，C＝3.

S是stride，stride为1，逐行扫描。stride为2，隔一行扫描一次

XP是填0后的输入。若不填0，则XP＝X.

F是filter number，系数的高和宽分别是HH，WW。Ho和Wo是输出的高，宽。

根据上面的公式可以写出卷积层前向传播的一个朴素的实现，在实现过程中，可以暂时不考虑for循环有多少层，后面可以优化。

def conv_forward_naive(x,w,b,conv_param):
    
    """
    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
        - 'stride': The number of pixels between adjacent receptive fields in the
      horizontal and vertical directions.
        - 'pad': The number of pixels that will be used to zero-pad the input.
    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
        H' = 1 + (H + 2 * pad - HH) / stride
        W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    N, C, H, W=x.shape
    F, _, HH, WW=w.shape
    S,P = conv_param['stride'],conv_param['pad']
    Ho =int( 1 + (H + 2 * P - HH) / S)
    Wo = int(1 + (W + 2 * P - WW) / S)
    out=np.zeros((N,F,Ho,Wo))
    
    x_pad=np.zeros((N,C,H+2*P,W+2*P))
    x_pad[:,:,P:P+H,P:P+W]=x
    
    
    for i in range(Ho):
        for j in range(Wo):
            x_pad_mask=x_pad[:,:,i*S:i*S+HH,j*S:j*S+WW]
            for f in range(F):
                out[:,f,i,j]=np.sum(x_pad_mask*w[f,:,:,:],axis=(1,2,3))#逐通道相加
                
    out = out + (b)[None, :, None, None]
                
    
    cache=(x,w,b,conv_param)
    
    return out,cache

前向传播还算是比较容易理解，实现起来不是特别难。

卷积层反向传播

反向传播求导很简单，和BN求导难度相比差远了，只是计算起来比较复杂，搞得有点懵。假设卷积层后反传的就是loss对于该层输出的梯度。那么，

代码实现如下：

def conv_backward_naive(dout,cache):
    """
    Inputs:
        - dout: Upstream derivatives.
        - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive
    Outputs:
        - dx: Gradient with respect to x
        - dw: Gradient with respect to w
        - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    N, F, H1, W1 = dout.shape
    x, w, b, conv_param = cache
    N, C, H, W = x.shape
    HH = w.shape[2]
    WW = w.shape[3]
    S = conv_param['stride']
    P = conv_param['pad']
    
    dx, dw, db=np.zeros_like(x),np.zeros_like(w),np.zeros_like(b)
    
    x_pad=np.zeros((N,C,H+2*P,W+2*P))
    x_pad[:,:,P:P+H,P:P+W]=x
    dx_pad=np.zeros_like(x_pad)
    
    db=np.sum(dout,axis=(0,2,3))
    
    for i in range(H1):
        for j in range(W1):
            x_pad_mask=x_pad[:,:,i*S:i*S+HH,j*S:j*S+WW]
                
            for f in range(F):
                dw[f,:,:,:]+=np.sum(x_pad_mask * (dout[:, f, i, j])[:, None, None, None], axis=0)
                
            for n in range(N): #compute dx_pad
                dx_pad[n, :,i*S:i*S+HH,j*S:j*S+WW] += np.sum((w[:, :, :, :] *(dout[n, :, i, j])[:,None ,None, None]), axis=0)
        
    dx = dx_pad[:, :, P:P+H, P:P+W]
    return dx, dw, db

最后

上面关于卷积层的实现主要参考了cs231n大作业，实现卷积只是大作业的一部分，这个大作业还包括实现max pooling，dropout等等，具体可以看这里。