验证码识别
做验证码识别,是因为爬虫程序中需要识别验证码,而我又不想花钱去打码平台。所以只能自己做个识别程序。刚开始我是用了OCR识别,试验后发现结果不是很理想。又把目光放到了机器学习上,用Pytorch自己搭建CNN神经网络进行学习。
我先从某网站爬取了一千多张验证码图片,然后手工标注了三四百张图片,放进模型进行学习。然后用模型预测标注其他验证码,然后手动纠正。获得了一千多份标注好的数据进行学习,最终在测试集(三四百张)上预测正确率有96%左右。
1.创建字符串列表
import random
import time
import torch
captcha_array = list("1234567890qwertyuiopasdfghjklzxcvbnm")
captcha_size = 4
2.定义字符串与onehot编码互转函数
def text2Vec(text):
"""
input: text
return: one-hot
"""
one_hot = torch.zeros(4,len(captcha_array))
for i in range(len(text)):
one_hot[i,captcha_array.index(text[i])] = 1
return one_hot
def Vec2text(vec):
"""
input: one-hot
return: text
"""
vec = torch.argmax(vec,1)
text = ""
for i in vec:
text += captcha_array[i]
return text
3.定义cnn模型结构
from torch import nn
class Model(nn.Module):
def __init__(self):
super(Model , self).__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1,out_channels=64,kernel_size=3,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=64,out_channels=128,kernel_size=3,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=128,out_channels=256,kernel_size=3,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.layer4 = nn.Sequential(
nn.Conv2d(in_channels=256,out_channels=512,kernel_size=3,padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2)
)
self.layer5 = nn.Sequential(
nn.Flatten(),
nn.Linear(in_features=15360,out_features=4096),
nn.Dropout(0.2),
nn.ReLU(),
nn.Linear(in_features=4096,out_features=36*4)
)
def forward(self,x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.layer5(x)
return x
def predict(self,x):
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.layer5(x)
y = Vec2text(x.view(-1,36))
return y
4. 数据预处理
import os
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
class MyData(Dataset):
def __init__(self,root_dir):
self.root_dir = root_dir
self.img_path = [os.path.join(self.root_dir,img) for img in os.listdir(root_dir)]
self.transforms = transforms.Compose(
[
transforms.ToTensor(),
transforms.Resize((60,160)),
transforms.Grayscale()
]
)
def __len__(self):
return self.img_path.__len__()
def __getitem__(self, index):
image_path = self.img_path[index]
image = self.transforms(Image.open(image_path))
label = self.img_path[index].split("\\")[-1].split("_")[0]
label_tensor = torch.flatten(text2Vec(label))
return image,label_tensor
5. 训练并保存模型
from torch.utils.data import DataLoader
from torch import optim
train_data = DataLoader(MyData("./datasets/train"),batch_size=64,shuffle=True,num_workers=0,pin_memory=True)
test_data = DataLoader(MyData("./datasets/test"),batch_size=1,shuffle=True,num_workers=0,pin_memory=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model()
model = model.to(device)
loss_fn = nn.MultiLabelSoftMarginLoss()
loss_fn = loss_fn.to(device)
optimizer = optim.Adam(model.parameters(),lr=0.001)
epoch = 200
for i in range(1,epoch+1):
print("-------------第{}轮训练开始------------".format(i))
# 训练步骤开始
model.train()
for imgs,labels in train_data:
imgs,labels = imgs.to(device),labels.to(device)
outputs = model(imgs)
loss = loss_fn(outputs,labels)
# 优化器优化模型
optimizer.zero_grad()
loss.backward()
optimizer.step()
print("训练轮数:{},loss:{}".format(i,loss.item()))
# 测试步骤
model.eval()
total_test_loss = 0
total_accuracy = 0
with torch.no_grad():
for imgs,labels in test_data:
imgs,labels = imgs.to(device),labels.to(device)
outputs = model(imgs)
loss = loss_fn(outputs,labels)
total_test_loss += loss.item()
if Vec2text(outputs.view(-1,36)) == Vec2text(labels.view(-1,36)):
total_accuracy += 1
print("整体测试集上的loss:{}".format(total_test_loss))
print("整体测试集上的正确率:{}".format(total_accuracy/MyData("./datasets/test").__len__()))
if((total_accuracy/MyData("./datasets/test").__len__()) >= 0.96) or (i == epoch-1):
torch.save(model.state_dict(),"./models/model.pth".format(i))
print("模型已保存")
break
输出:
-------------第1轮训练开始------------
训练轮数:1,loss:0.14288298785686493
整体测试集上的loss:39.58032390475273
整体测试集上的正确率:0.0
-------------第2轮训练开始------------
训练轮数:2,loss:0.1313132643699646
整体测试集上的loss:38.98908883333206
整体测试集上的正确率:0.0
…
-------------第22轮训练开始------------
训练轮数:22,loss:0.08956490457057953
整体测试集上的loss:27.89856981858611
整体测试集上的正确率:0.013333333333333334
-------------第23轮训练开始------------
训练轮数:23,loss:0.06463437527418137
整体测试集上的loss:19.876742523163557
整体测试集上的正确率:0.08
-------------第24轮训练开始------------
训练轮数:24,loss:0.04172850400209427
整体测试集上的loss:14.986714116297662
整体测试集上的正确率:0.25
-------------第25轮训练开始------------
训练轮数:25,loss:0.028918465599417686
整体测试集上的loss:10.024020065553486
整体测试集上的正确率:0.43
-------------第26轮训练开始------------
训练轮数:26,loss:0.015734059736132622
整体测试集上的loss:8.309810376027599
整体测试集上的正确率:0.5933333333333334
-------------第27轮训练开始------------
训练轮数:27,loss:0.01344425231218338
整体测试集上的loss:7.2159516087849624
整体测试集上的正确率:0.67
-------------第28轮训练开始------------
训练轮数:28,loss:0.00688259769231081
整体测试集上的loss:6.06079014801071
整体测试集上的正确率:0.7166666666666667
…
-------------第158轮训练开始------------
训练轮数:158,loss:0.00015873285883571953
整体测试集上的loss:2.9681727227329793
整体测试集上的正确率:0.9533333333333334
-------------第159轮训练开始------------
训练轮数:159,loss:2.2554470433533425e-06
整体测试集上的loss:3.739820039788348
整体测试集上的正确率:0.9366666666666666
-------------第160轮训练开始------------
训练轮数:160,loss:1.1268558409938123e-05
整体测试集上的loss:2.750196551042272
整体测试集上的正确率:0.9633333333333334
模型已保存
6.加载模型进行预测
model = Model()
model_status = torch.load("./models/model.pth")
model.load_state_dict(model_status)
``
from PIL import Image
Image.open("./datasets/test/098e_165830112.jpg")
输出:
img = Image.open("./datasets/test/098e_165830112.jpg")
transforms = transforms.Compose(
[
transforms.ToTensor(),
transforms.Resize((60,160)),
transforms.Grayscale()
]
)
img = transforms(img)
img = img.view(-1,1,60,160)
img
输出:
tensor([[[[0.6859, 0.6865, 0.6884, …, 0.6926, 0.6926, 0.6926],
[0.6859, 0.6865, 0.6884, …, 0.6926, 0.6926, 0.6926],
[0.6859, 0.6864, 0.6881, …, 0.6926, 0.6926, 0.6926],
…,
[0.6857, 0.6860, 0.6870, …, 0.6926, 0.6926, 0.6926],
[0.6857, 0.6860, 0.6870, …, 0.6926, 0.6926, 0.6926],
[0.6856, 0.6860, 0.6870, …, 0.6926, 0.6926, 0.6926]]]])
预测结果:
model.predict(img)
输出:
‘098e’
原文地址:https://www.jb51.cc/wenti/3281806.html
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。