如何解决遍历Pandas数据框以检查不同索引的逻辑条件
我正在尝试遍历熊猫数据框,并只是检查重复项。如果存在重复的“ ID”字段,则比较重复项的“ BeginTime”字段,并根据几次if / elif / else比较的结果分配新的时间。我遇到的麻烦是,我不知道如何在数据帧的不同索引处比较“ ID”的逻辑条件。当我运行代码时,用于检查重复项的输出是正确的,但是没有分配任何新时间。这就是我到目前为止所拥有的...
重复项的所需输出如下: 9999, 1250, 1130, 1130或1250, 1250, 9999
import pandas as import pd
# dataframe initialized
dfmwf = pd.DataFrame({'ID': [97330,97330,95232,91293,92471,91616,97297,94305,94305],'BeginTime': [1135,1255,1135,1415,1415]})
# set counter for testing purposes
count = 0
# iterate through the dataframe rows
for index,row in dfmwf.iterrows():
# check if a duplicate,this seems to be working fine
if dfmwf.loc[index,'ID'] == dfmwf.shift(+1).loc[index,'ID']:
print(count,'yes')
count += 1
# check multiple conditions of duplicates,this block of code is not working at all
if dfmwf.loc[index,'BeginTime'] == 1255 and dfmwf.shift(-1).loc[index,'ID'] == 1135:
print('New time = 1250')
elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.shift(-1).loc[index,'ID'] == 1135:
print('New time = 1130')
elif dfmwf.loc[index,'ID'] == 1255:
print('New time = either
elif dfmwf.loc[index,'BeginTime'] == 1135 and dfmwf.shift(-1).loc[index,'BeginTime'] == 1255 and dfmwf.shift(-2).loc[index,'ID'] == 1415:
print('New time = 9999')
else:
print(count,'no,New time = 9999')
count += 1
解决方法
我注意到的第一件事是,在检查重复项时使用shift(+1),但对于重复项的条件则使用shift(-1)和shift(-2)。这意味着您要查看第二个和第三个重复的ID,然后再查看以下不一定相同的ID。
我认为您真正追求的功能是iloc
,可以用作.iloc[index-1]
,使用iloc修改代码并修复一些错字如下。
# check if a duplicate,this seems to be working fine
if dfmwf.loc[index,'ID'] == dfmwf.iloc[index-1]['ID']:
# print(count,'yes')
count += 1
# check multiple conditions of duplicates,this block of code is not working at all
if dfmwf.loc[index,'BeginTime'] == 1255 and dfmwf.iloc[index-1]['BeginTime'] == 1135:
print('New time = 1250')
elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.iloc[index-1]['BeginTime'] == 1135:
print('New time = 1130')
elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.iloc[index-1]['BeginTime'] == 1255:
print('New time = either')
elif dfmwf.loc[index,'BeginTime'] == 1135 and dfmwf.iloc[index-1]['BeginTime'] == 1255 and dfmwf.iloc[index-2]['BeginTime'] == 1415:
print('New time = 9999')
这将提供以下输出:
0 no,New time = 9999
1 yes
New time = 1250
2 no,New time = 9999
3 yes
New time = 1130
4 no,New time = 9999
5 yes
New time = either
6 no,New time = 9999
7 yes
8 yes
9 no,New time = 9999
10 no,New time = 9999
11 yes
12 no,New time = 9999
13 yes
New time = 1250
14 yes
New time = either
此外,我将避免遍历整个数据框。
for id in df.ID.unique(): # iterate through all unique id's
this_id = df.loc[df.ID == id]
if len(t.index)<=1: #if there is only one entry of this ID
print(count,'no,New time = 9999')
continue
# check the conditions on the subset 'this_id'
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。