Python df 按日期添加行，因此每个组在同一日期结束填充剩余的行

如何解决Python df 按日期添加行，因此每个组在同一日期结束填充剩余的行

要使用地理绘图动画框架，我希望我的所有组都在同一日期结束。这将避免最后一帧使某些国家变灰。目前，根据日期的最新数据点是'Timestamp('2021-05-13 00:00:00')'。

因此，在下一步中，我想根据所有国家/地区添加新行，以便它们在 df 中的最新日期之前都有行。可以使用填充填充“people_vaccinated_per_hundred”和“people_fully_vaccinated_per_hundred”列。

数据：

所以理想情况下，如果例如挪威比最新数据点 '2021-05-13' 少 1 天，那么它应该添加一个新行，如下所示。 DF 中的所有其他国家/地区都应该这样做。

示例

    country iso_code    date    people_vaccinated_per_hundred   people_fully_vaccinated_per_hundred
12028   norway  nor 2021-05-02  0.00    NaN
12029   norway  nor 2021-05-03  0.00    NaN
12188   norway  nor ...         ...     ...
12188   norway  nor 2021-05-11  27.81   9.55
12189   norway  nor 2021-05-12  28.49   10.42

Add new row
12189   norway  nor 2021-05-13  28.49   10.42

解决方法

对此的一种直接方法可能是创建国家和日期的笛卡尔积，然后加入该产品为每个缺失的日期和国家组合创建空值。

countries = df.loc[:,['country','iso_code']].drop_duplicates()
dates = df.loc[:,'date'].drop_duplicates()
all_countries_dates = countries.merge(dates,how='cross')

df.merge(all_countries_dates,how='right',on=['country','iso_code','date'])

使用如下数据集：

country       iso_code  date        people_vaccinated   people_fully_vaccinated
Norway        NOR       2021-05-09  0.00                1.00
Norway        NOR       2021-05-10  0.00                3.00
Norway        NOR       2021-05-11  27.81               9.55
Norway        NOR       2021-05-12  28.49               10.42
Norway        NOR       2021-05-13  28.49               10.42
United States USA       2021-05-09  23.00               3.00
United States USA       2021-05-10  23.00               3.00

这种转变会给你：

country       iso_code  date        people_vaccinated   people_fully_vaccinated
Norway        NOR       2021-05-09  0.00                1.00
Norway        NOR       2021-05-10  0.00                3.00
Norway        NOR       2021-05-11  27.81               9.55
Norway        NOR       2021-05-12  28.49               10.42
Norway        NOR       2021-05-13  28.49               10.42
United States USA       2021-05-09  23.00               3.00
United States USA       2021-05-10  23.00               3.00
United States USA       2021-05-11  NaN                 NaN
United States USA       2021-05-12  NaN                 NaN
United States USA       2021-05-13  NaN                 NaN

此后，您可以使用 fillna 更改添加行的空值。

pandas 1.1.5 之前版本的交叉连接代码

    #creating a df with all unique countries and iso_codes
#creating a new table with all the dates in the original dataframe
countries = animation_covid_df.loc[:,'iso_code']].drop_duplicates()
dates_df = animation_covid_df.loc[:,['date']].drop_duplicates()

#creating an index called row number to later merge the dates table with the countries table on
dates_df['row_number'] = dates_df.reset_index().index

number_of_dates = dates_df.max() #shows the number of dates or rows in the the dates table

#creating an equivilant number of rows for each country as there are dates in the dates_df 
indexed_country = countries.append([countries]*number_of_dates[1],ignore_index=True)
indexed_country = indexed_country.sort_values(['country','iso_code'],ascending=True)
#creating a new column called 'row_number' to join the indexed_country df with the dates_df
indexed_country['row_number'] = indexed_country.groupby(['country','iso_code']).cumcount()+1

#merging all the indexed countries with all the possible dates on the row number
indexed_country_date_df = indexed_country.merge(dates_df,on='row_number',how='left',suffixes=('_1','_2'))

#setting the 'date' column in both tables to datetime so they can be merged on
animation_covid_df['date'] = pd.to_datetime(animation_covid_df['date'])
indexed_country_date_df['date'] = pd.to_datetime(indexed_country_date_df['date'])