微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何将列值的下半部分移动到新创建的列中?

如何解决如何将列值的下半部分移动到新创建的列中?

我有一列包含前 50% 行中三个不同测量值的平均值,以及后 50% 行中的相关标准误差。上一列是用于每个名称名称(meanNativeSR、meanExoticSR、meanTotalSR、seN、seE、seT)。我想创建 2 个新列,在第一列中包含 se_ 名称,在第二列中包含它们的值,然后去掉底部 50% 的行。这是我的数据集:

df <- structure(list(Invasion = structure(c(1L,1L,2L,3L,3L
),.Label = c("Low","Medium","High"),class = "factor"),Growth = structure(c(1L,3L),.Label = c("cover","herb","woody"),mean_se = c("meanNativeSR","meanNativeSR","meanExoticSR","meanTotalSR","seN","seE","seT","seT"
    ),value = c(0.769230769230769,0.230769230769231,0.923076923076923,2.46153846153846,6.84615384615385,0.538461538461538,1.69230769230769,1.76923076923077,1.15384615384615,0.384615384615385,1.38461538461538,2.23076923076923,2.07692307692308,0.769230769230769,2.53846153846154,4.23076923076923,3.23076923076923,3.76923076923077,2.76923076923077,3.84615384615385,0.280883362823162,0.12162606385263,0.329364937914491,0.312463015562922,0.705710715103738,0.24325212770526,0.36487819155789,0.191021338791684,0.140441681411581,0.180400606147055,0.201081886427668,0.273771237231572,0.394738572265145,0.440772139427464,0.532938710021193,0.257050482766198,0.336767321450351)),row.names = c(NA,-54L),class = c("tbl_df","tbl","data.frame"))

我能够弄清楚我想用下面的代码做什么,但肯定有一种更优雅的方式,因为这种方式需要我创建不必要的中间体。

#create an intermediate data.frame that contains just the means and their values from the first half of original df
df.mean <- head(df,-27)
#rename columns 3 and 4
colnames(df.mean)[3] <- "mean"
colnames(df.mean)[4] <- "mean_value"


#create another intermediate data.frame with standard error values from the bottom half of original df
df.se <- df[28:54,]
#rename columns 3 and 4
colnames(df.se)[3] <- "se"
colnames(df.se)[4] <- "se_value"


#cbind those together to get desired result
df.final <- cbind(df.mean,df.se[,3:4])

#remove intermediates
rm(df.mean); rm(df.se)

是否有更简单的方法来实现这一点,也许使用管道或 tidyverse 中的某些函数,甚至使用基础 R?

解决方法

这是一种使用 pivot_widerunnest 的方法:

library(tidyverse)
df %>%
    mutate(class = str_extract(mean_se,"(N|E|T)"),fun = str_extract(mean_se,"(mean|se)")) %>%
    pivot_wider(id_cols = c("Invasion","Growth"),names_from = "fun",values_from = c("mean_se","value")) %>%
  unnest()
# A tibble: 27 x 6
   Invasion Growth mean_se_mean mean_se_se value_mean value_se
   <fct>    <fct>  <chr>        <chr>           <dbl>    <dbl>
 1 Low      cover  meanNativeSR seN             0.769    0.281
 2 Low      cover  meanExoticSR seE             0.385    0.140
 3 Low      cover  meanTotalSR  seT             1.15     0.274
 4 Low      herb   meanNativeSR seN             0.231    0.122
 5 Low      herb   meanExoticSR seE             0        0    
 6 Low      herb   meanTotalSR  seT             0.231    0.122
 7 Low      woody  meanNativeSR seN             0.923    0.329
 8 Low      woody  meanExoticSR seE             1.38     0.180
 9 Low      woody  meanTotalSR  seT             2.54     0.243
10 Medium   cover  meanNativeSR seN             2.46     0.312
# … with 17 more rows

您会收到一些警告,但它应该可以正常工作。

,

使用 tidyverse,我们可以执行 group_split,更改列名称,然后执行 inner_join

library(dplyr)
library(purrr)
df %>%
   group_split(grp = row_number() > 27,.keep = FALSE) %>% 
   map2(list(c('mean','mean_value'),c('se','se_value')),~ {nm1 <- .y
           .x  %>%
             rename_at(3:4,~ nm1) %>%
             mutate(rn = row_number())} ) %>% 
  reduce(inner_join) %>% 
  select(-rn)

-输出

# A tibble: 27 x 6
#   Invasion Growth mean         mean_value se    se_value
#   <fct>    <fct>  <chr>             <dbl> <chr>    <dbl>
# 1 Low      cover  meanNativeSR      0.769 seN      0.281
# 2 Low      herb   meanNativeSR      0.231 seN      0.122
# 3 Low      woody  meanNativeSR      0.923 seN      0.329
# 4 Medium   cover  meanNativeSR      2.46  seN      0.312
# 5 Medium   herb   meanNativeSR      6.85  seN      0.706
# 6 Medium   woody  meanNativeSR      0.538 seN      0.243
# 7 High     cover  meanNativeSR      1.69  seN      0.365
# 8 High     herb   meanNativeSR      1.77  seN      0.281
# 9 High     woody  meanNativeSR      1.15  seN      0.191
#10 Low      cover  meanExoticSR      0.385 seE      0.140
# … with 17 more rows
,

我认为,除了将事情整合在一起之外,没有什么更短、更简单的方法可以实现您的目标。代码中最长的部分是分配新的列名,它不能真正缩短。其余的可以放在一行中。但实际上,您必须始终在简洁性和可读性之间取得平衡。

上面显示的 dplyr 方法非常简洁,但我相信它们旨在处理比您更复杂/更一般的情况。

df_final_2 <- cbind(head(df,-27),df[28:54,3:4])
colnames(df_final_2)[3:6] <- c("mean","mean_value","se","se_value")

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。