如何解决构建用于灰度MRI数据的二进制分类的3D CNN,尝试模型时出现数据维数问题
我正在尝试为灰度MRI数据的二进制分类构建3D CNN。我是新来的,所以不要一拳了,我在这里学习!我有20个3D文件的子样本,其尺寸为(189、233、197)。我使用np.reshape添加了一个尺寸以充当通道,以获取(189,233,197,1)。我使用tf.shape来获取数据集的形状,即
<tf.Tensor: shape=(5,),dtype=int32,numpy=array([ 20,189,233,197,1],dtype=int32)>
与标签数据上的相同
<tf.Tensor: shape=(1,numpy=array([20],dtype=int32)>
下面是我正在使用的完整代码:
import numpy as np
import glob
import os
import tensorflow as tf
import pandas as pd
import glob
import SimpleITK as sitk
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from google.colab import drive
drive.mount('/content/gdrive')
datapath = ('/content/gdrive/My Drive/DirectoryTest/All Data/')
patients = os.listdir(datapath)
labels_df = pd.read_csv('/content/Data_Index.csv',index_col = 0 )
FullDataSet = []
for patient in patients:
a = sitk.ReadImage(datapath + patient)
b = sitk.GetArrayFromImage(a)
c = np.reshape(b,(189,197))
FullDataSet.append(c)
labelset = []
for i in patients:
label = labels_df.loc[i,'Group']
if label == 'AD': # use `==` instead of `is` to compare strings
labelset.append(0.)
elif label == 'CN':
labelset.append(1.)
else:
raise "Oops,unkNown label"
labelset = np.array(labelset)
x_train,x_valid,y_train,y_valid = train_test_split(FullDataSet,labelset,train_size=0.75)
## 3D CNN
CNN_model = tf.keras.Sequential(
[
#tf.keras.layers.Reshape([189,input_shape=[189,197]),tf.keras.layers.Input(shape =[ 189,1] ),tf.keras.layers.Conv3D(kernel_size=(7,7,7),filters=32,activation='relu',padding='same',strides=(3,3,3)),#tf.keras.layers.Batchnormalization(),tf.keras.layers.MaxPool3D(pool_size=(3,3),padding='same'),tf.keras.layers.Dropout(0.20),tf.keras.layers.Conv3D(kernel_size=(5,5,5),filters=64,tf.keras.layers.MaxPool3D(pool_size=(2,2,2),tf.keras.layers.Conv3D(kernel_size=(3,filters=128,strides=(1,1,1)),# last activation Could be either sigmoid or softmax,need to look into this more. Sig for binary output,Soft for multi output
tf.keras.layers.Flatten(),tf.keras.layers.Dense(256,activation='relu'),tf.keras.layers.Dense(64,tf.keras.layers.Dense(1,activation='sigmoid')
])
# Compile the model
CNN_model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.00001),loss='binary_crossentropy',metrics=['accuracy'])
# print model layers
CNN_model.summary()
CNN_history = CNN_model.fit(x_train,epochs=10,validation_data=[x_valid,y_valid])
当我尝试拟合模型时,尺寸似乎没有对齐,并且出现以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-48-c698c45a4d36> in <module>()
1 #running of the model
2 #CNN_history = CNN_model.fit(dataset_train,epochs=100,validation_data =dataset_test,validation_steps=1)
----> 3 CNN_history = CNN_model.fit(x_train,y_valid],batch_size = 1)
4
5
3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self,*args,**kwargs)
106 def _method_wrapper(self,**kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
--> 108 return method(self,**kwargs)
109
110 # Running inside `run_distribute_coordinator` already.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self,x,y,batch_size,epochs,verbose,callbacks,validation_split,validation_data,shuffle,class_weight,sample_weight,initial_epoch,steps_per_epoch,validation_steps,validation_batch_size,validation_freq,max_queue_size,workers,use_multiprocessing)
1061 use_multiprocessing=use_multiprocessing,1062 model=self,-> 1063 steps_per_execution=self._steps_per_execution)
1064
1065 # Container that configures and calls `tf.keras.Callback`s.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self,use_multiprocessing,model,steps_per_execution)
1115 use_multiprocessing=use_multiprocessing,1116 distribution_strategy=ds_context.get_strategy(),-> 1117 model=model)
1118
1119 strategy = ds_context.get_strategy()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/data_adapter.py in __init__(self,sample_weights,sample_weight_modes,steps,**kwargs)
280 label,",".join(str(i.shape[0]) for i in nest.flatten(data)))
281 msg += "Please provide data which shares the same first dimension."
--> 282 raise ValueError(msg)
283 num_samples = num_samples.pop()
284
ValueError: Data cardinality is ambiguous:
x sizes: 189,189
y sizes: 15
Please provide data which shares the same first dimension.
训练分割设置为0.75,所以20中的15。我感到困惑,为什么这不起作用,也无法弄清楚为什么这是模型正在接收的输入。之前我已经获得了一些帮助,使用以下代码创建虚拟集会导致模型运行:
train_size = 20
val_size = 5
X_train = np.random.random([train_size,197]).astype(np.float32)
X_valid = np.random.random([val_size,197]).astype(np.float32)
y_train = np.random.randint(2,size=train_size).astype(np.float32)
y_valid = np.random.randint(2,size=val_size).astype(np.float32)
在这上面我已经将头撞墙了一段时间。任何帮助将不胜感激。
解决方法
我目前没有发表评论的权限,否则,由于可能不是完整答案,我会说:
当我尝试创建玩具4维数据集,然后将其附加到列表(添加通道-这是我认为您已经完成的事情?)时,得到的形状不是(dim1,dim2,dim3) ,dim4,channel),但是(channel,dim1,dim2,dim3,dim4)。我在下面提供了一个有效的示例:
import numpy as np
df = np.arange(0,625).reshape(5,5,5)
print(df.shape) # returns (5,5)
lst = []
lst.append(df)
print(np.asarray(g).shape) # returns (1,5)
基于此,可能是您的数据形状实际上是您想要的(1,189,233,197)而不是(189,233,197,1)吗?
此外,向我显示的错误消息似乎暗示您好像没有传递相同数量的X和y样本?
ValueError: Data cardinality is ambiguous:
x sizes: 189,189,189
y sizes: 15
Please provide data which shares the same first dimension.
通常,网络的输入将具有相同的第一个大小(并且例如窃取您自己的玩具数据集并运行):
print(X_train.shape,y_train_shape,X_test.shape,y_test.shape)
# returns: (20,233,197),(20,) (5,197) (5,)
之所以匹配,是因为这实际上意味着每个样本都对应一个标签,反之亦然。在我看来,错误消息指出每个X和y输入的第一个维度分别为189和15。您可以在输入网络之前仔细检查形状吗?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。