如何解决熊猫根据 2 个变量添加缺失的日期
我有 3 种不同产品的时间序列,这些产品在一段时间内已在 4 家不同的商店销售。我想把缺失的数据补上,这样我就有了一个完整的数据集。所有缺失的数据都应该用0代替。
这是生成数据集的代码。 randomtimes 函数是从@abernert 复制的 [https://stackoverflow.com/questions/50165501/generate-random-list-of-timestamps-# in-python][1]
import datetime
import random
import pandas as pd
import numpy as np
random.seed(42)
np.random.seed(42)
def randomtimes(start,end,n):
stime = datetime.datetime.strptime(start,'%d-%m-%Y')
etime = datetime.datetime.strptime(end,'%d-%m-%Y')
td = etime - stime
print(td)
dates = [round(random.random(),1) * td + stime for _ in range(n)]
return dates
# set vars
nsp = 5 # nr of days
nd = 3 # nr of days
ns = 3 # nr of stores
npr = 2 # nr of products
# generate data
total = nd*ns*npr
s = random.sample('1'*nd*ns +'2'*nd*ns+'3'*nd*ns,total)# number of stores
p = random.sample("a"*nd*ns+ "b"*nd*ns,total)
so = list(np.random.choice(range(20,100),total))
stime = '01-02-2000'
etime = '03-02-2000'
date = np.array(randomtimes(stime,etime,nsp)).astype('datetime64[D]')
product = []
store = []
sold = []
for x in range(1,len(date)+1):
product.append(s.pop())
store.append(p.pop())
sold.append(so.pop())
data = {'date':date,'product':product,'sold':sold,'store':store
}
df = pd.DataFrame(data )
df
date product sold store
0 2000-02-02 3 95 b
1 2000-02-01 1 88 a
2 2000-02-02 1 81 a
3 2000-02-03 1 66 a
4 2000-02-02 3 88 a
这个结果应该是这样的。
0 2000-02-01 1 88 a
1 2000-02-01 2 0 a
2 2000-02-01 3 0 a
3 2000-02-01 1 0 b
4 2000-02-01 2 0 b
5 2000-02-01 3 95 b
6 2000-02-02 1 81 a
7 2000-02-02 2 0 a
8 2000-02-02 3 88 a
9 2000-02-02 1 0 b
10 2000-02-02 2 0 b
11 2000-02-02 3 0 b
12 2000-02-03 1 66 a
13 2000-02-03 2 0 a
14 2000-02-03 3 0 a
15 2000-02-03 1 0 b
16 2000-02-03 2 0 b
17 2000-02-03 3 0 b
感谢您的帮助。
解决方法
我建议创建一个包含所有已知值(日期、产品、商店)的第二个数据框,并将所有“已售”值设置为零。之后,您可以遍历现有的“已售出”数据并将它们复制到新的数据框中。
您可以在下面找到一个说明此过程的工作示例。
import datetime
import pandas as pd
import numpy as np
np.random.seed(0)
n_days = 5 # number of data frame rows
products = [1,2,3] # list of product ids
stores = ["a","b","c","d"] # list of store ids
def simulate_data():
data = {
"date": random_days(n_days),"product": [np.random.choice(products) for row in range(n_days)],"sold": [np.random.choice(range(100)) for row in range(n_days)],"store": [np.random.choice(stores) for row in range(n_days)]
}
df = pd.DataFrame(data)
return df
def random_days(n,daydelta=7):
"""
return: list of random days
n: number of days
daydelta: number of days from today on to choose from
"""
daydeltas = range(daydelta)
random_daydeltas = [datetime.timedelta(days=int(np.random.choice(daydeltas))) for _ in range(n)]
today = datetime.date.today()
random_days = [today + random_daydelta for random_daydelta in random_daydeltas]
return random_days
def get_dfMask(df):
duration = range((max(df.date) - min(df.date)).days)
unique_dates = [min(df.date) + datetime.timedelta(days=days) for days in duration]
data = {
"date": get_filledDates(unique_dates),"product": products * len(stores) * len(unique_dates),"sold": [0] * len(products) * len(stores) * len(unique_dates),"store": get_filledStores(unique_dates)
}
df_filled = pd.DataFrame(data)
return df_filled
def get_filledDates(unique_dates):
dates_filled = []
for unique_date in unique_dates:
dates_filled.extend([unique_date] * len(products) * len(stores))
return dates_filled
def get_filledStores(unique_dates):
stores_filled = []
for store in stores: stores_filled.extend([store] * len(products))
stores_filled *= len(unique_dates)
return stores_filled
def copy_soldValues(source_df,destination_df):
for row in range(source_df.__len__()):
position = (destination_df.loc[:,"date"] == source_df.loc[row,"date"]) \
& (destination_df.loc[:,"product"] == source_df.loc[row,"product"]) \
& (destination_df.loc[:,"store"] == source_df.loc[row,"store"])
destination_df.loc[position,"sold"] = source_df.loc[row,"sold"]
return destination_df
def main():
df = simulate_data()
filled_df = get_dfMask(df)
filled_df = copy_soldValues(df,filled_df)
print(df)
print(filled_df)
if __name__ == "__main__":
main()
对我来说,你的第二个问题很难回答,因为我不知道你对模拟的期望。但是,在我的示例中,我使用了稍微修改的实现。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。