遍历数据框 - 编程之家

如何解决遍历数据框

我的代码提取了一个数据框对象，我想屏蔽该数据框。如果值

import pandas as pd
XTrain = pd.read_excel('C:\\blahblahblah.xlsx')

for each in XTrain:
  if each <= 15:
    each = 1
  else:
    each = 0

我来自VBA和.NET，所以我知道它不是pythonic，但是对我来说似乎超级容易... 由于代码遍历df标头，因此代码出现错误。所以我尝试检查类型

for each in XTrain:
  if isinstance(each,str) is False:
    if each <= 15:
      each = 1
    else:
      each = 0

这一次它到达了最终的头，但是没有进入数据帧。这使我认为我没有正确遍历thr数据帧？被困了几个小时，有人可以给我一点帮助吗？

谢谢！

解决方法

for each in XTrain总是循环访问列名，仅仅。这就是熊猫设计的方式。

熊猫允许直接使用数字进行比较/算术运算。所以你想要：

 # le is less than or equal to
 XTrains.le(15).astype(int)

 # same as
 # (XTrain <= 15).astype(int)

如果您真的要迭代（不要），请记住数据框是二维的。像这样：

for index,row in df.iterrows():
    for cell in row:
        if cell <= 15:
            # do something
            # cell = 1 might not modify the cell in original dataframe
            # this is a python thing and you will get used to it
        else:
            # do something else

设置

df = pd.DataFrame({'A' : range(0,20,2),'B' : list(range(10,19)) + ['a']})
print(df)

    A   B
0   0  10
1   2  11
2   4  12
3   6  13
4   8  14
5  10  15
6  12  16
7  14  17
8  16  18
9  18   a

解决方案：pd.to_numeric 避免str值和DataFrame.le

出现问题

df.apply(lambda x: pd.to_numeric(x,errors='coerce')).le(15).astype(int)

输出

如果要保留字符串值：

df2 = df.apply(lambda x: pd.to_numeric(x,errors='coerce'))
new_df = df2.where(lambda x: x.isna(),df2.le(15).astype(int)).fillna(df)
print(new_df)


   A  B
0  1  1
1  1  1
2  1  1
3  1  1
4  1  1
5  1  1
6  1  0
7  1  0
8  0  0
9  0  a

使用applymap将函数应用于数据帧的每个元素，并使用lambda编写函数。

df.applymap(lambda x: x if isinstance(each,str) else 1 if x <= 15 else 0)