如何解决那一年有人在 Pandas 呆了几个月?
我们考虑这个代表工人合同的数据框。我想列出某个工人在某一年工作了多少个月。
df = pd.DataFrame{'id': {0: 19019,1: 17160,2: 21593,3: 3146,4: 21593,5: 3146,6: 22737,7: 25311,8: 25740,9: 3289,10: 26312,11: 28028,12: 17017,13: 27742,14: 26884,15: 31174,16: 31889,17: 33319,18: 35178,19: 35464},'start_date': {0: Timestamp('2016-06-01 00:00:00'),1: Timestamp('2016-09-01 00:00:00'),2: Timestamp('2016-11-01 00:00:00'),3: Timestamp('2017-01-01 00:00:00'),4: Timestamp('2017-03-01 00:00:00'),5: Timestamp('2017-08-01 00:00:00'),6: Timestamp('2018-09-01 00:00:00'),7: Timestamp('2018-09-01 00:00:00'),8: Timestamp('2018-10-01 00:00:00'),9: Timestamp('1999-11-01 00:00:00'),10: Timestamp('2018-10-01 00:00:00'),11: Timestamp('2019-01-01 00:00:00'),12: Timestamp('2009-11-01 00:00:00'),13: Timestamp('2019-09-01 00:00:00'),14: Timestamp('2020-03-01 00:00:00'),15: Timestamp('2020-03-01 00:00:00'),16: Timestamp('2020-04-14 00:00:00'),17: Timestamp('2020-10-01 00:00:00'),18: Timestamp('2021-03-01 00:00:00'),19: Timestamp('2021-03-08 00:00:00')},'end_date': {0: Timestamp('2017-01-31 00:00:00'),1: Timestamp('2018-07-31 00:00:00'),2: Timestamp('2017-02-28 00:00:00'),3: Timestamp('2017-07-31 00:00:00'),4: Timestamp('2017-12-31 00:00:00'),5: Timestamp('2017-12-31 00:00:00'),6: Timestamp('2021-12-31 00:00:00'),7: Timestamp('2019-08-16 00:00:00'),8: Timestamp('2019-11-30 00:00:00'),9: Timestamp('2022-12-31 00:00:00'),10: Timestamp('2020-09-30 00:00:00'),11: Timestamp('2021-02-28 00:00:00'),12: Timestamp('2022-10-31 00:00:00'),13: Timestamp('2022-02-28 00:00:00'),14: Timestamp('2022-02-28 00:00:00'),15: Timestamp('2022-02-28 00:00:00'),16: Timestamp('2021-06-30 00:00:00'),17: Timestamp('2022-09-30 00:00:00'),18: Timestamp('2022-02-28 00:00:00'),19: Timestamp('2022-03-07 00:00:00')}})
因此,如果我们考虑 2020 年:
year = 2020
after = df.index[df.start_date.dt.year >= year] # Started late in that year
before = df.index[df.end_date.dt.year <= year] # Left early in that year
df['after'] = df.iloc[after].start_date.dt.month
df['before'] = df.iloc[before].end_date.dt.month
df = df.fillna(0)
df['months'] = 12
df['months'] -= df['after']
df[df.before > 0]['months'] -= 12 - df['before']
df = df.drop(['before','after'],axis=1)
dm = df[(df.start_date.dt.year <= year) & (df.end_date.dt.year >= year)]
dm
我得到了 2020 年工作了 n 个月的工人名单:
id start_date end_date months
13 22737 2018-09-01 2021-12-31 12.0
16 3289 1999-11-01 2022-12-31 12.0
17 26312 2018-10-01 2020-09-30 12.0
18 28028 2019-01-01 2021-02-28 12.0
19 17017 2009-11-01 2022-10-31 12.0
20 27742 2019-09-01 2022-02-28 12.0
21 26884 2020-03-01 2022-02-28 9.0
22 31174 2020-03-01 2022-02-28 9.0
23 31889 2020-04-14 2021-06-30 8.0
24 33319 2020-10-01 2022-09-30 2.0
是否有更好的 pandaish 方法来实现相同的目标?
(随意重命名问题,我相信它的名字很糟糕)
解决方法
通过 np.select
的一种方式:
year = 2020
condlist = [
(df.start_date.dt.year < year) & (df.end_date.dt.year > year),(df.start_date.dt.year == year) & (df.end_date.dt.year == year),df.start_date.dt.year == year,df.end_date.dt.year == year,]
choicelist = [
12,df.end_date.dt.month - df.start_date.dt.month,12 - df.start_date.dt.month,df.end_date.dt.month,]
df['work_hours'] = np.select(condlist,choicelist)
注意:如果需要,删除 work_hours 为 0 的行。
,这是一种方式:
start = '1/1/2020'
end = '12/31/2020'
s = (df['end_date'].clip(upper = pd.to_datetime(end)) -
df['start_date'].clip(lower = pd.to_datetime(start))).floordiv(pd.to_timedelta(30,'d'))
df = df.assign(months = s.where(s.gt(0)))
,
您可以定义 2 个日期范围:第一个是 2020
中的目标周期,月频率,第二个是每行从 start_date
到 end_date
的周期,月频率。然后通过np.intersect1d()
找到它们的共同交月,并通过共同月份数组的长度找到匹配的月份数:
rng2020 = pd.date_range(start='2020-01-01',end='2020-12-31',freq='M')
df['months'] = df.apply(lambda x: len(np.intersect1d(pd.date_range(start=x['start_date'],end=x['end_date'],freq='M'),rng2020)),axis=1)
df.loc[df['months'] !=0]
结果:
id start_date end_date months
6 22737 2018-09-01 2021-12-31 12
9 3289 1999-11-01 2022-12-31 12
10 26312 2018-10-01 2020-09-30 9
11 28028 2019-01-01 2021-02-28 12
12 17017 2009-11-01 2022-10-31 12
13 27742 2019-09-01 2022-02-28 12
14 26884 2020-03-01 2022-02-28 10
15 31174 2020-03-01 2022-02-28 10
16 31889 2020-04-14 2021-06-30 9
17 33319 2020-10-01 2022-09-30 3
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。