具有公共键时如何用另一个数据帧填充数据帧中的缺失数据

如何解决具有公共键时如何用另一个数据帧填充数据帧中的缺失数据

我有两个数据框。作为示例，请参阅下面的内容。当具有相同的 ProductID 时，如何使用来自 dfB 的相同值填充 df[GrossRate]== 0

基本上我在 df 中的 GrossRate 应该是 150 40 238 32

dataA = {'date': ['20210101','20210102','20210103','20210104'],'quanitity': [22000,25000,27000,35000],'NetRate': ['nan','nan','nan'],'GrossRate': [150,238,0],'ProductID': [9613,7974,1714,5302],}

df = pd.DataFrame(dataA,columns = ['date','quanitity','NetRate','GrossRate','ProductID' ])

    date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan          0       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan          0       5302

dataB = {
        'ProductID': ['9613.T','7974.T','1714.T','5302.T'],'GrossRate': [10,40,28,32],}

dfB = pd.DataFrame(dataB,columns = ['ProductID','GrossRate' ])
dfB.ProductID = dfB.ProductID.str.replace('.T','')

print (dfB)

  ProductID  GrossRate
0      9613         10
1      7974         40
2      1714         28
3      5302         32

解决方法

试试这个列表理解：

df['GrossRate'] = [x if x != 0 else y for x,y in zip(df['GrossRate'],dfB['GrossRate'])]

如果ProductID列中的相同行数和相同顺序不需要由ProductID匹配，那么使用numpy.where：

df['GrossRate'] = np.where(df['GrossRate'] == 0,dfB['GrossRate'],df['GrossRate'])

print (df)
       date  quanitity NetRate  GrossRate  ProductID
0  20210101      22000     nan        150       9613
1  20210102      25000     nan         40       7974
2  20210103      27000     nan        238       1714
3  20210104      35000     nan         32       5302

如果需要通过 ProductID 匹配，请使用：

dfB.ProductID = dfB.ProductID.str.replace('.T','').astype(int)

df['GrossRate'] = (np.where(df['GrossRate'] == 0,df['ProductID'].map(dfB.set_index('ProductID')['GrossRate']),df['GrossRate']))