微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

那一年有人在 Pandas 呆了几个月?

如何解决那一年有人在 Pandas 呆了几个月?

我们考虑这个代表工人合同的数据框。我想列出某个工人在某一年工作了多少个月。

df = pd.DataFrame{'id': {0: 19019,1: 17160,2: 21593,3: 3146,4: 21593,5: 3146,6: 22737,7: 25311,8: 25740,9: 3289,10: 26312,11: 28028,12: 17017,13: 27742,14: 26884,15: 31174,16: 31889,17: 33319,18: 35178,19: 35464},'start_date': {0: Timestamp('2016-06-01 00:00:00'),1: Timestamp('2016-09-01 00:00:00'),2: Timestamp('2016-11-01 00:00:00'),3: Timestamp('2017-01-01 00:00:00'),4: Timestamp('2017-03-01 00:00:00'),5: Timestamp('2017-08-01 00:00:00'),6: Timestamp('2018-09-01 00:00:00'),7: Timestamp('2018-09-01 00:00:00'),8: Timestamp('2018-10-01 00:00:00'),9: Timestamp('1999-11-01 00:00:00'),10: Timestamp('2018-10-01 00:00:00'),11: Timestamp('2019-01-01 00:00:00'),12: Timestamp('2009-11-01 00:00:00'),13: Timestamp('2019-09-01 00:00:00'),14: Timestamp('2020-03-01 00:00:00'),15: Timestamp('2020-03-01 00:00:00'),16: Timestamp('2020-04-14 00:00:00'),17: Timestamp('2020-10-01 00:00:00'),18: Timestamp('2021-03-01 00:00:00'),19: Timestamp('2021-03-08 00:00:00')},'end_date': {0: Timestamp('2017-01-31 00:00:00'),1: Timestamp('2018-07-31 00:00:00'),2: Timestamp('2017-02-28 00:00:00'),3: Timestamp('2017-07-31 00:00:00'),4: Timestamp('2017-12-31 00:00:00'),5: Timestamp('2017-12-31 00:00:00'),6: Timestamp('2021-12-31 00:00:00'),7: Timestamp('2019-08-16 00:00:00'),8: Timestamp('2019-11-30 00:00:00'),9: Timestamp('2022-12-31 00:00:00'),10: Timestamp('2020-09-30 00:00:00'),11: Timestamp('2021-02-28 00:00:00'),12: Timestamp('2022-10-31 00:00:00'),13: Timestamp('2022-02-28 00:00:00'),14: Timestamp('2022-02-28 00:00:00'),15: Timestamp('2022-02-28 00:00:00'),16: Timestamp('2021-06-30 00:00:00'),17: Timestamp('2022-09-30 00:00:00'),18: Timestamp('2022-02-28 00:00:00'),19: Timestamp('2022-03-07 00:00:00')}})

因此,如果我们考虑 2020 年:

year = 2020
after = df.index[df.start_date.dt.year >= year] # Started late in that year
before = df.index[df.end_date.dt.year <= year] # Left early in that year

df['after'] = df.iloc[after].start_date.dt.month
df['before'] = df.iloc[before].end_date.dt.month
df = df.fillna(0)
df['months'] = 12
df['months'] -= df['after']
df[df.before > 0]['months'] -= 12 - df['before']
df = df.drop(['before','after'],axis=1)

dm = df[(df.start_date.dt.year <= year) & (df.end_date.dt.year >= year)]
dm

我得到了 2020 年工作了 n 个月的工人名单:

    id      start_date  end_date    months
13  22737   2018-09-01  2021-12-31  12.0
16  3289    1999-11-01  2022-12-31  12.0
17  26312   2018-10-01  2020-09-30  12.0
18  28028   2019-01-01  2021-02-28  12.0
19  17017   2009-11-01  2022-10-31  12.0
20  27742   2019-09-01  2022-02-28  12.0
21  26884   2020-03-01  2022-02-28  9.0
22  31174   2020-03-01  2022-02-28  9.0
23  31889   2020-04-14  2021-06-30  8.0
24  33319   2020-10-01  2022-09-30  2.0

是否有更好的 pandaish 方法来实现相同的目标?

(随意重命名问题,我相信它的名字很糟糕)

解决方法

通过 np.select 的一种方式:

year = 2020

condlist = [
    (df.start_date.dt.year < year) & (df.end_date.dt.year > year),(df.start_date.dt.year == year) & (df.end_date.dt.year == year),df.start_date.dt.year == year,df.end_date.dt.year == year,]

choicelist = [
    12,df.end_date.dt.month - df.start_date.dt.month,12 - df.start_date.dt.month,df.end_date.dt.month,]

df['work_hours'] = np.select(condlist,choicelist)

注意:如果需要,删除 work_hours 为 0 的行。

,

这是一种方式:

start = '1/1/2020'
end = '12/31/2020'

s = (df['end_date'].clip(upper = pd.to_datetime(end)) - 
     df['start_date'].clip(lower = pd.to_datetime(start))).floordiv(pd.to_timedelta(30,'d'))
df = df.assign(months = s.where(s.gt(0)))
,

您可以定义 2 个日期范围:第一个是 2020 中的目标周期,月频率,第二个是每行从 start_dateend_date 的周期,月频率。然后通过np.intersect1d()找到它们的共同交月,并通过共同月份数组的长度找到匹配的月份数:

rng2020 = pd.date_range(start='2020-01-01',end='2020-12-31',freq='M')
df['months'] = df.apply(lambda x: len(np.intersect1d(pd.date_range(start=x['start_date'],end=x['end_date'],freq='M'),rng2020)),axis=1)
df.loc[df['months'] !=0]

结果:

       id start_date   end_date  months
6   22737 2018-09-01 2021-12-31      12
9    3289 1999-11-01 2022-12-31      12
10  26312 2018-10-01 2020-09-30       9
11  28028 2019-01-01 2021-02-28      12
12  17017 2009-11-01 2022-10-31      12
13  27742 2019-09-01 2022-02-28      12
14  26884 2020-03-01 2022-02-28      10
15  31174 2020-03-01 2022-02-28      10
16  31889 2020-04-14 2021-06-30       9
17  33319 2020-10-01 2022-09-30       3

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。