微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何通过将变量与另一个变量匹配来创建新列并重复它直到 R 中的第一个变量发生变化?

如何解决如何通过将变量与另一个变量匹配来创建新列并重复它直到 R 中的第一个变量发生变化?

受体 小时 hour.inc lat lon 高度 压力 日期
1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00
1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00
1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00
1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00
1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00
1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00
1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00
1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00
1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00
1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00
2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00
2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00
2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00
2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00

上面是我的数据框的样子,但我正在尝试创建一个名为 date1 的新列,它看起来像下面的框架。

 receptor year month day hour hour.inc    lat      lon height pressure                date       date1
1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00

忽略最左边的索引。我想将受体(例如:1,2)与第一次出现的日期(例如:2018-01-03 19:00:00,2018-01-04 19:00:00)匹配,然后重复直到受体变化。

我在 R 中工作,所以我想在 R 中找到解决方案,但我也可以使用 python 解决方案并利用 R 中的 Reticulate 包。

解决方法

使用 data.table 你可以试试

library(data.table)
setDT(df) #converting into data.frame 
df[,date1 := date[1],receptor] # taking the first date per receptor
df

#Output 

    receptor year month day hour hour.inc    lat      lon height pressure                date               date1
 1:        1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
 2:        1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
 3:        1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
 4:        1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
 5:        1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
 6:        1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
 7:        1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
 8:        1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
 9:        1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10:        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
11:        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
12:        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
13:        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
14:        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
,

尝试使用 np.nan 填充未更改值的位置,并使用 date(该索引的)填充更改值的位置,然后使用 .ffill() 简单地进行前向填充

df.receptor.shift().ne(df.receptor) 将为您提供受体值变化的位置。比较前一个值和当前值以查看变化。

df['date1'] = np.where(df.receptor.shift().ne(df.receptor),df.date,np.nan)
df.date1 = df.date1.ffill()

受体 小时 hour.inc lat lon 高度 压力 日期 date1
0 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
1 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
2 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
3 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
4 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
5 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
6 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
7 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
8 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
9 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
10 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
11 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
12 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
13 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
,

在计算 ave 列以使用 Date 返回每个日期分组的第一个日期时间后考虑基本 R 的 head

df <- within(df,{
  date_short <- as.Date(substr(as.character(date),1,10),origin="1970-01-01")
  first_dt_hour <- ave(date,date_short,FUN=function(x) head(x,1))
  rm(date_short)   # DROP HELPER COLUMN
})

print(df)
#    receptor year month day hour hour.inc    lat      lon height pressure                date       first_dt_hour
# 1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
# 2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
# 3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
# 4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
# 5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
# 6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
# 7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
# 8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
# 9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
# 10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
# 38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
# 39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
# 40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
# 41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00

数据

data <- ' receptor year month day hour hour.inc    lat      lon height pressure                date
1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 "2018-01-03 19:00:00" 
2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 "2018-01-03 18:00:00" 
3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 "2018-01-03 17:00:00" 
4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 "2018-01-03 16:00:00"
5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 "2018-01-03 15:00:00"
6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 "2018-01-03 14:00:00"
7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 "2018-01-03 13:00:00"
8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 "2018-01-03 12:00:00"
9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 "2018-01-03 11:00:00"
10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 "2018-01-03 10:00:00"
38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 "2018-01-04 19:00:00"
39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 "2018-01-04 18:00:00"
40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 "2018-01-04 17:00:00"
41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 "2018-01-04 16:00:00"'

df <- read.table(text=data,colClasses=c(rep("integer",7),rep("numeric",4),"POSIXct"),header=TRUE)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。