遍历Pandas数据框以检查不同索引的逻辑条件

如何解决遍历Pandas数据框以检查不同索引的逻辑条件

我正在尝试遍历熊猫数据框，并只是检查重复项。如果存在重复的“ ID”字段，则比较重复项的“ BeginTime”字段，并根据几次if / elif / else比较的结果分配新的时间。我遇到的麻烦是，我不知道如何在数据帧的不同索引处比较“ ID”的逻辑条件。当我运行代码时，用于检查重复项的输出是正确的，但是没有分配任何新时间。这就是我到目前为止所拥有的...

重复项的所需输出如下： 9999， 1250， 1130， 1130或1250， 1250， 9999

import pandas as import pd 

# dataframe initialized
dfmwf = pd.DataFrame({'ID': [97330,97330,95232,91293,92471,91616,97297,94305,94305],'BeginTime': [1135,1255,1135,1415,1415]})
# set counter for testing purposes
count = 0

# iterate through the dataframe rows
for index,row in dfmwf.iterrows():

# check if a duplicate,this seems to be working fine   
  if dfmwf.loc[index,'ID'] == dfmwf.shift(+1).loc[index,'ID']:
       print(count,'yes')
       count += 1

# check multiple conditions of duplicates,this block of code is not working at all           
       if dfmwf.loc[index,'BeginTime'] == 1255 and dfmwf.shift(-1).loc[index,'ID'] == 1135:
               print('New time = 1250')
       elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.shift(-1).loc[index,'ID'] == 1135:
               print('New time = 1130')
       elif dfmwf.loc[index,'ID'] == 1255:
               print('New time = either
       elif dfmwf.loc[index,'BeginTime'] == 1135 and dfmwf.shift(-1).loc[index,'BeginTime'] == 1255 and dfmwf.shift(-2).loc[index,'ID'] == 1415:   
               print('New time = 9999')  
  else:
      print(count,'no,New time = 9999') 
      count += 1

解决方法

我注意到的第一件事是，在检查重复项时使用shift（+1），但对于重复项的条件则使用shift（-1）和shift（-2）。这意味着您要查看第二个和第三个重复的ID，然后再查看以下不一定相同的ID。

我认为您真正追求的功能是iloc，可以用作.iloc[index-1]，使用iloc修改代码并修复一些错字如下。

# check if a duplicate,this seems to be working fine
  if dfmwf.loc[index,'ID'] == dfmwf.iloc[index-1]['ID']:
       # print(count,'yes')
       count += 1

# check multiple conditions of duplicates,this block of code is not working at all
       if dfmwf.loc[index,'BeginTime'] == 1255 and dfmwf.iloc[index-1]['BeginTime'] == 1135:
               print('New time = 1250')
       elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.iloc[index-1]['BeginTime'] == 1135:
               print('New time = 1130')
       elif dfmwf.loc[index,'BeginTime'] == 1415 and dfmwf.iloc[index-1]['BeginTime'] == 1255:
               print('New time = either')
       elif dfmwf.loc[index,'BeginTime'] == 1135 and dfmwf.iloc[index-1]['BeginTime'] == 1255 and dfmwf.iloc[index-2]['BeginTime'] == 1415:
               print('New time = 9999')

这将提供以下输出：

0 no,New time = 9999
1 yes
New time = 1250
2 no,New time = 9999
3 yes
New time = 1130
4 no,New time = 9999
5 yes
New time = either
6 no,New time = 9999
7 yes
8 yes
9 no,New time = 9999
10 no,New time = 9999
11 yes
12 no,New time = 9999
13 yes
New time = 1250
14 yes
New time = either

此外，我将避免遍历整个数据框。

for id in df.ID.unique(): # iterate through all unique id's
   this_id = df.loc[df.ID == id]
   if len(t.index)<=1: #if there is only one entry of this ID
         print(count,'no,New time = 9999')    
         continue
   # check the conditions on the subset 'this_id'

遍历Pandas数据框以检查不同索引的逻辑条件

如何解决遍历Pandas数据框以检查不同索引的逻辑条件

解决方法

相关推荐