如何解决深度学习对 xy 空间坐标的时间序列进行分类 - python
我遇到了一些深度学习分类问题。我将附上一个简短的训练数据示例来帮助描述问题。
数据是xy点的时间序列,由更小的子序列event
组成。所以每个唯一的 event
都是独立的。我有两个独特的序列 (10,20)
下面的偶数时间长度。对于给定的序列,每个单独的点都有自己的唯一标识符 user_id
。这些点的 xy 轨迹将在给定序列中略有不同,具体时间段在 interval
中找到。我还有一个单独的 xy 点用作参考 (centre_x,center_y)
,它详细说明了所有点的大致中间/中心。
最后,target_label
对这些点相对于彼此的位置进行分类。所以以 centre_x,center_y
为参考,有 5 个类Middle、Top、Bottom、Right、Left。每个唯一的 event
只能有一个标签。
问题:
-
显然数据集很小,但我关心准确率。我想我需要合并参考点
(centre_x,center_y)
-
每次测试迭代时,我都会收到所有这些警告。我认为这与转换为张量有关,但它没有任何帮助。
WARNING:tensorflow:7 次调用
触发了 tf.function 回溯。跟踪是昂贵的,并且过多的跟踪可能是由于 (1) 在循环中重复创建 @tf.function,(2) 传递具有不同形状的张量,(3) 传递 Python 对象而不是张量。对于(1),请在循环之外定义您的@tf.function。对于 (2),@tf.function 有 Experiment_relax_shapes=True 选项,可以放宽参数形状,避免不必要的回溯。对于 (3),请参阅 https://www.tensorflow.org/guide/function#controlling_retracing 和 https://www.tensorflow.org/api_docs/python/tf/function 了解更多详情。
示例 df:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# number of intervals
n = 10
# center locations for points
locs_1 = {'A': (5,5),'B': (5,8),'C': (5,2),'D': (8,5)}
# initialize data
data_1 = pd.DataFrame(index=range(n*len(locs_1)),columns=['x','y','user_id'])
for i,group in enumerate(locs_1.keys()):
data_1.loc[i*n:((i+1)*n)-1,['x','y']] = np.random.normal(locs_1[group],[0.2,0.2],[n,2])
data_1.loc[i*n:((i+1)*n)-1,['user_id']] = group
# generate time interavls
data_1['interval'] = data_1.groupby('user_id').cumcount() + 1
# assign unique string to differentiate sequences
data_1['event'] = 10
# center of all points for unqiue sequence 1
data_1['center_x'] = 5
data_1['center_y'] = 5
# classify labels
data_1['target_label'] = ['Middle' if ele == 'A' else 'Top' if ele == 'B' else 'Bottom' if ele == 'C' else 'Right' for ele in data_1['user_id']]
# center locations for points
locs_2 = {'A': (14,15),'B': (16,'C': (15,12),'D': (19,15)}
# initialize data
data_2 = pd.DataFrame(index=range(n*len(locs_2)),group in enumerate(locs_2.keys()):
data_2.loc[i*n:((i+1)*n)-1,'y']] = np.random.normal(locs_2[group],2])
data_2.loc[i*n:((i+1)*n)-1,['user_id']] = group
# generate time interavls
data_2['interval'] = data_2.groupby('user_id').cumcount() + 1
# center of points for unqiue sequence 1
data_2['event'] = 20
# center of all points for unqiue sequence 2
data_2['center_x'] = 15
data_2['center_y'] = 15
# classify labels
data_2['target_label'] = ['Middle' if ele == 'A' else 'Middle' if ele == 'B' else 'Bottom' if ele == 'C' else 'Right' for ele in data_2['user_id']]
df = pd.concat([data_1,data_2])
df = df.sort_values(by = ['event','interval','user_id']).reset_index(drop = True)
df:
x y user_id interval event center_x center_y target_label
0 5.288275 5.211246 A 1 10 5 5 Middle
1 4.765987 8.200895 B 1 10 5 5 Top
2 4.943518 1.645249 C 1 10 5 5 Bottom
3 7.930763 4.965233 D 1 10 5 5 Right
4 4.866746 4.980674 A 2 10 5 5 Middle
.. ... ... ... ... ... ... ... ...
75 18.929254 15.297437 D 9 20 15 15 Right
76 13.701538 15.049276 A 10 20 15 15 Middle
77 16.028816 14.985672 B 10 20 15 15 Middle
78 15.044336 11.631358 C 10 20 15 15 Bottom
79 18.95508 15.217064 D 10 20 15 15 Right
型号:
labels = df['target_label'].dropna().sort_values().unique()
n_samples = df.groupby(['user_id','event']).ngroups
n_ints = 10
X = df[['x','y']].values.reshape(n_samples,n_ints,2).astype('float32')
y = df.drop_duplicates(subset = ['event','user_id','target_label'])
y = np.array(y['target_label'].groupby(level = 0).apply(lambda x: [x.values[0]]).tolist())
y = label_binarize(y,classes = labels)
# test,train split
trainX,testX,trainy,testy = train_test_split(X,y,test_size = 0.2)
# load the dataset,returns train and test X and y elements
def load_dataset():
# test,train split
trainX,test_size = 0.2)
return trainX,testy
# fit and evaluate a model
def evaluate_model(trainX,testy):
verbose,epochs,batch_size = 0,10,32
n_timesteps,n_features,n_outputs = trainX.shape[1],trainX.shape[2],trainy.shape[1]
model = Sequential()
model.add(Conv1D(filters=64,kernel_size=3,activation='relu',input_shape=(n_timesteps,n_features)))
model.add(Conv1D(filters=64,activation='relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(100,activation='relu'))
model.add(Dense(n_outputs,activation='softmax'))
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
# fit network
model.fit(trainX,epochs=epochs,batch_size=batch_size,verbose=verbose)
# evaluate model
_,accuracy = model.evaluate(testX,testy,verbose=0)
return accuracy
# summarize scores
def summarize_results(scores):
print(scores)
m,s = np.mean(scores),np.std(scores)
print('Accuracy: %.3f%% (+/-%.3f)' % (m,s))
# run an experiment
def run_experiment(repeats=10):
# load data
trainX,testy = load_dataset()
# repeat experiment
scores = list()
for r in range(repeats):
#r = tf.convert_to_tensor(r,dtype=tf.int32)
score = evaluate_model(trainX,testy)
score = score * 100.0
print('>#%d: %.3f' % (r+1,score))
scores.append(score)
# summarize results
summarize_results(scores)
# run the experiment
run_experiment()
解决方法
您正在尝试使用长度为 10 的 2d 时间序列进行时间序列分类。似乎每个类只有少量示例,这太少了,无法对神经网络进行任何训练。即使您有数百个示例,我也建议您使用一种能够处理较少数据的方法。一个例子是使用 K-最近邻,使用时间序列特定的距离度量,例如动态时间扭曲。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。