微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

根据某些条件将行分隔到时间段

如何解决根据某些条件将行分隔到时间段

我有一个包含 100k+ 行的 DataFrame,我需要遍历它并根据它所在的时隙进行计数。一个 DataFrame 示例如下:

   Call Sign    Entry_Time                 Exit_Time        Sector
0   EA213    2020-10-01 22:24:00      2020-10-01 22:50:55   north
1   NGF23    2020-10-01 22:32:00      2020-10-01 22:53:00   West
2   USR24    2020-10-01 22:44:00      2020-10-01 23:01:53   Central
3   EF36D    2020-10-01 22:50:55      2020-10-01 23:04:07   north
4   NGF23    2020-10-01 22:53:00      2020-10-01 23:03:54   north
5   USR24    2020-10-01 23:01:53      2020-10-01 23:13:44   West
6   EF36D    2020-10-01 23:04:07      2020-10-01 23:26:48   Central
7   USR24    2020-10-01 23:13:44      2020-10-01 23:28:00   Central
8   OSA26    2020-10-02 15:02:00      2020-10-02 15:09:31   West
9   OSA26    2020-10-02 15:09:31      2020-10-02 15:25:47   north

如果进入和退出时间在开始和结束时间段内,我需要计算每一行。为此,我使用以下代码

startDay   = 1
startMonth = 10
startYear  = 2020
endDay     = 5
endMonth   = 10
endYear    = 2020
interval  = 30
startDate = str(datetime(startYear,startMonth,startDay).date())
endDate   = str(datetime(endYear,endMonth,endDay).date())
    
timeInterval=pd.DataFrame()
sectors = ['West','north','Central']
endDateMinus1 = str(datetime(endYear,endDay)-timedelta(seconds=1))
timeInterval['Start']=pd.date_range(start=startDate+' 00:00:00',end=endDateMinus1,freq=str(interval)+'T')
timeInterval['End']= pd.date_range(start=startDate+' 00:'+str(interval)+':00',end=endDate+' 00:00:00',freq=str(interval)+'T')

for index,row in timeInterval.iterrows():
    startMask  = (df['Entry_Time'] >= row.Start) | (df['Exit_Time'] >= row.Start)
    endMask    = (df['Entry_Time'] <  row.End) | (df['Exit_Time'] <  row.End)
    timeInterval.loc[index,'Total Count'] = df[startMask & endMask].count()['Call Sign']
    
    for sector in sectors:
        filteredDF = df[startMask & endMask & (df['Sector']==sector)]
        filteredDF[sector+' Time']=0
        
        filter1 = (filteredDF['Entry_Time']<row.Start) & (filteredDF['Exit_Time']<=row.End)
        filter2 = (filteredDF['Entry_Time']<row.Start) & (filteredDF['Exit_Time']>row.End)
        filter3 = (filteredDF['Entry_Time']>=row.Start) & (filteredDF['Exit_Time']<=row.End)
        filter4 = (filteredDF['Entry_Time']>=row.Start) & (filteredDF['Exit_Time']>row.End)

        filteredDF.loc[filter1,sector+' Time'] = (filteredDF.loc[filter1,'Exit_Time']-row.Start).dt.seconds/60
        filteredDF.loc[filter2,sector+' Time'] = interval
        filteredDF.loc[filter3,sector+' Time'] = (filteredDF.loc[filter3,'Exit_Time']-filteredDF.loc[filter3,'Entry_Time']).dt.seconds/60
        filteredDF.loc[filter4,sector+' Time'] = (row.End-filteredDF.loc[filter4,'Entry_Time']).dt.seconds/60

        timeInterval.loc[index,sector+' Total Count'] = filteredDF.count()['Call Sign']
        timeInterval.loc[index,sector+' Total Time (min)'] = float("{:.2f}".format(filteredDF[sector+' Time'].sum()))
        
        timeInterval.loc[index,sector+' Average Time (min)'] = 0 if timeInterval.loc[index,sector+' Total Count']==0 else timeInterval.loc[index,sector+' Total Time (min)']/timeInterval.loc[index,sector+' Total Count']

结果将是这样的:

Sector Analysis Result

根据共享的数据帧仔细查看某些行不为零的地方。

closer look

问题是随着间隔时间的增加数量或行数的增加,程序需要很长时间才能完成。我需要以不同的方式替换 for 循环,但不太确定该怎么做。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。