如何解决Tensorflow模型的预处理图像,而不是Pytorch预处理
当输入图像得到此预处理时,我有一个pytorch resnet101编码器模型:
import torchvision as tv
from PIL import Image
data_transforms = tv.transforms.Compose([
tv.transforms.Resize((224,224)),tv.transforms.ToTensor(),tv.transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225])])
img = Image.open(img_path)
img = img.convert('RGB')
img = data_transforms(img)
img = torch.FloatTensor(img)
img = img.unsqueeze(0)
print(img)
在这种情况下,编码器的输入形状为[1,3,224,224],此图片使用ImageNet的均值和std标准化。 现在我将该模型导出到tensorflow,那么,如何对tf-model进行相同的图像预处理?
我试图做这样的事情:
from PIL import Image
img = Image.open(img_path)
img = img.convert('RGB')
img = tf.keras.preprocessing.image.img_to_array(img)
img = tf.image.resize(tf_img,(224,224))
img = tf.keras.applications.resnet.preprocess_input(img)# now shape is [224,224,3]
img = tf.reshape(img,[1,3,224])
print(img)
但是我确定我做错了,因为对于一张图像来说,炬和张量张量看起来非常不同,并且输出结果与一种编码器模型完全不同。
任何人都可以帮忙,我应该在tf预处理中解决什么?
解决方法
此:
img = Image.open(img_path)
img = img.convert('RGB')
可以替换为
image = tf.io.read_file(filename=filepath)
image = tf.image.decode_jpeg(image,channels=3) #or decode_png
此外,unsqueeze
和squeeze
的对立是expand_dims
:
img = tf.expand_dims(img,axis=0)
一切正常,只需确保
tf.keras.applications.resnet.preprocess_input(img) `and` data.transforms()
产生所需/必要的转换。
对于照片,我非常确定您在PyTorch的情况下错过了/255.0,在TensorFlow的情况下错过了255.0除法。
实际上,当深入研究Keras后端时,您可以看到,当您调用预处理功能时,它将在此处调用此功能:
def _preprocess_numpy_input(x,data_format,mode):
"""Preprocesses a Numpy array encoding a batch of images.
Arguments:
x: Input array,3D or 4D.
data_format: Data format of the image array.
mode: One of "caffe","tf" or "torch".
- caffe: will convert the images from RGB to BGR,then will zero-center each color channel with
respect to the ImageNet dataset,without scaling.
- tf: will scale pixels between -1 and 1,sample-wise.
- torch: will scale pixels between 0 and 1 and then
will normalize each channel with respect to the
ImageNet dataset.
Returns:
Preprocessed Numpy array.
"""
if not issubclass(x.dtype.type,np.floating):
x = x.astype(backend.floatx(),copy=False)
if mode == 'tf':
x /= 127.5
x -= 1.
return x
elif mode == 'torch':
x /= 255.
mean = [0.485,0.456,0.406]
std = [0.229,0.224,0.225]
else:
if data_format == 'channels_first':
# 'RGB'->'BGR'
if x.ndim == 3:
x = x[::-1,...]
else:
x = x[:,::-1,...]
else:
# 'RGB'->'BGR'
x = x[...,::-1]
mean = [103.939,116.779,123.68]
std = None
# Zero-center by mean pixel
if data_format == 'channels_first':
if x.ndim == 3:
x[0,:,:] -= mean[0]
x[1,:] -= mean[1]
x[2,:] -= mean[2]
if std is not None:
x[0,:] /= std[0]
x[1,:] /= std[1]
x[2,:] /= std[2]
else:
x[:,:] -= mean[0]
x[:,1,:] -= mean[1]
x[:,2,:] -= mean[2]
if std is not None:
x[:,:] /= std[0]
x[:,:] /= std[1]
x[:,:] /= std[2]
else:
x[...,0] -= mean[0]
x[...,1] -= mean[1]
x[...,2] -= mean[2]
if std is not None:
x[...,0] /= std[0]
x[...,1] /= std[1]
x[...,2] /= std[2]
return x
Keras和TensorFlow中用于ResNet50预处理的默认mode
参数令人惊讶地不是tf
而是caffe
。
因此,对图像进行的预处理在else分支上(我在后面添加else分支和代码,以便您可以遵循转换并查看丢失的内容):
else:
if data_format == 'channels_first':
# 'RGB'->'BGR'
if x.ndim == 3:
x = x[::-1,123.68]
std = None
# Zero-center by mean pixel
if data_format == 'channels_first':
if x.ndim == 3:
x[0,2] /= std[2]
return x
说明是:
caffe:将图像从RGB转换为BGR, 然后将每个颜色通道零居中 关于ImageNet数据集, 无需缩放。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。