AlexNet是2012年ImageNet比赛的冠军,虽然过去了很长时间,但是作为深度学习中的经典模型,AlexNet不但有助于我们理解其中所使用的很多技巧,而且非常有助于提升我们使用深度学习工具箱的熟练度。尤其是我刚入门深度学习,迫切需要一个能让自己熟悉tensorflow的小练习,于是就有了这个小玩意儿......
先放上我的代码:https://github.com/hjptriplebee/AlexNet_with_tensorflow
如果想运行代码,详细的配置要求都在上面链接的readme文件中了。本文建立在一定的tensorflow基础上,不会对太细的点进行说明。
关于模型结构网上的文献很多,我这里不赘述,一会儿都在代码里解释。
有一点需要注意,AlexNet将网络分成了上下两个部分,在论文中两部分结构完全相同,唯一不同的是他们放在不同GPU上训练,因为每一层的feature map之间都是独立的(除了全连接层),所以这相当于是提升训练速度的一种方法。很多AlexNet的复现都将上下两部分合并了,因为他们都是在单个GPU上运行的。虽然我也是在单个GPU上运行,但是我还是很想将最原始的网络结构还原出来,所以我的代码里也是分开的。
模型定义
def maxPoolLayer(x,kHeight,kWidth,strideX,strideY,name,padding = "SAME"): """max-pooling""" return tf.nn.max_pool(x,ksize = [1,1],strides = [1,padding = padding,name = name) def dropout(x,keepPro,name = None): """dropout""" return tf.nn.dropout(x,name) def LRN(x,R,alpha,beta,name = None,bias = 1.0): """LRN""" return tf.nn.local_response_normalization(x,depth_radius = R,alpha = alpha,beta = beta,bias = bias,name = name) def fcLayer(x,inputD,outputD,reluFlag,name): """fully-connect""" with tf.variable_scope(name) as scope: w = tf.get_variable("w",shape = [inputD,outputD],dtype = "float") b = tf.get_variable("b",[outputD],dtype = "float") out = tf.nn.xw_plus_b(x,w,b,name = scope.name) if reluFlag: return tf.nn.relu(out) else: return out def convLayer(x,featureNum,padding = "SAME",groups = 1):#group为2时等于AlexNet中分上下两部分 """convlutional""" channel = int(x.get_shape()[-1])#获取channel conv = lambda a,b: tf.nn.Conv2d(a,padding = padding)#定义卷积的匿名函数 with tf.variable_scope(name) as scope: w = tf.get_variable("w",shape = [kHeight,channel/groups,featureNum]) b = tf.get_variable("b",shape = [featureNum]) xNew = tf.split(value = x,num_or_size_splits = groups,axis = 3)#划分后的输入和权重 wNew = tf.split(value = w,axis = 3) featureMap = [conv(t1,t2) for t1,t2 in zip(xNew,wNew)] #分别提取feature map mergeFeatureMap = tf.concat(axis = 3,values = featureMap) #feature map整合 # print mergeFeatureMap.shape out = tf.nn.bias_add(mergeFeatureMap,b) return tf.nn.relu(tf.reshape(out,mergeFeatureMap.get_shape().as_list()),name = scope.name) #relu后的结果
定义了卷积、pooling、LRN、dropout、全连接五个模块,其中卷积模块因为将网络的上下两部分分开了,所以比较复杂。接下来定义AlexNet。
class AlexNet(object): """AlexNet model""" def __init__(self,x,classNum,skip,modelPath = "bvlc_AlexNet.npy"): self.X = x self.KEEPPRO = keepPro self.CLASSNUM = classNum self.SKIP = skip self.MODELPATH = modelPath #build CNN self.buildCNN() def buildCNN(self): """build model""" conv1 = convLayer(self.X,11,4,96,"conv1","VALID") pool1 = maxPoolLayer(conv1,3,2,"pool1","VALID") lrn1 = LRN(pool1,2e-05,0.75,"norm1") conv2 = convLayer(lrn1,5,1,256,"conv2",groups = 2) pool2 = maxPoolLayer(conv2,"pool2","VALID") lrn2 = LRN(pool2,"lrn2") conv3 = convLayer(lrn2,384,"conv3") conv4 = convLayer(conv3,"conv4",groups = 2) conv5 = convLayer(conv4,"conv5",groups = 2) pool5 = maxPoolLayer(conv5,"pool5","VALID") fcIn = tf.reshape(pool5,[-1,256 * 6 * 6]) fc1 = fcLayer(fcIn,256 * 6 * 6,4096,True,"fc6") dropout1 = dropout(fc1,self.KEEPPRO) fc2 = fcLayer(dropout1,"fc7") dropout2 = dropout(fc2,self.KEEPPRO) self.fc3 = fcLayer(dropout2,self.CLASSNUM,"fc8") def loadModel(self,sess): """load model""" wDict = np.load(self.MODELPATH,encoding = "bytes").item() #for layers in model for name in wDict: if name not in self.SKIP: with tf.variable_scope(name,reuse = True): for p in wDict[name]: if len(p.shape) == 1: #bias 只有一维 sess.run(tf.get_variable('b',trainable = False).assign(p)) else: #weights sess.run(tf.get_variable('w',trainable = False).assign(p))
buildCNN函数完全按照AlexNet的结构搭建网络。
loadModel函数从模型文件中读取参数,采用的模型文件见github上的readme说明。
至此,我们定义了完整的模型,下面开始测试模型。
模型测试
ImageNet训练的AlexNet有很多类,几乎包含所有常见的物体,因此我们随便从网上找几张图片测试。比如我直接用了之前做项目的渣土车图片:
然后编写测试代码:
#some params dropoutPro = 1 classNum = 1000 skip = [] #get testimage testPath = "testModel" testImg = [] for f in os.listdir(testPath): testImg.append(cv2.imread(testPath + "/" + f)) imgMean = np.array([104,117,124],np.float) x = tf.placeholder("float",[1,227,3]) model = AlexNet.AlexNet(x,dropoutPro,skip) score = model.fc3 softmax = tf.nn.softmax(score) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) model.loadModel(sess) #加载模型 for i,img in enumerate(testImg): #img preprocess test = cv2.resize(img.astype(np.float),(227,227)) #resize成网络输入大小 test -= imgMean #去均值 test = test.reshape((1,3)) #拉成tensor maxx = np.argmax(sess.run(softmax,Feed_dict = {x: test})) res = caffe_classes.class_names[maxx] #取概率最大类的下标 #print(res) font = cv2.FONT_HERShey_SIMPLEX cv2.putText(img,res,(int(img.shape[0]/3),int(img.shape[1]/3)),font,(0,255,0),2)#绘制类的名字 cv2.imshow("demo",img) cv2.waitKey(5000) #显示5秒
如上代码所示,首先需要设置一些参数,然后读取指定路径下的测试图像,再对模型做一个初始化,最后是真正测试代码。测试结果如下:
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持编程小技巧。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。