微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

解析r中的列(或其他语言,如SQL)

这是当前的数据帧:
baking_time <- c("20 to 30 min","20 to 30 min","40 to 50 min","10 to 30 min","60 to 90 min","40 to 50 min")
cake_type <- c("Chocolate","Chocolate","Lemon","Tart","German","Lemon")


recipes <- data.frame(baking_time,cake_type)

现在我正在尝试解析烘焙时间来得到这个:

baking_time <- c(25,25,45,20,75,45)

我尝试过使用解析但是我在解析这两个数字时遇到的问题比对它们执行操作有困难

mutate(avg_time = (parse_number(baking_time) + parse_number(baking_time))/2)

解决方法

我们提取列的数字部分并获得平均值
library(tidyverse)
recipes %>% 
   mutate(avg_time = str_extract_all(baking_time,"\\d+") %>%
           map(.,~ mean(as.numeric(.x))))
#   baking_time cake_type avg_time
#1 20 to 30 min Chocolate       25
#2 20 to 30 min Chocolate       25
#3 40 to 50 min     Lemon       45
#4 10 to 30 min      Tart       20
#5 60 to 90 min    German       75
#6 40 to 50 min     Lemon       45

注意:readr :: parse_number提取一个数字部分.如果有多个元素,需要将其分解并应用parse_number

recipes %>% 
   separate(baking_time,into = c("first","second"),sep=" to ",remove = FALSE) %>% 
   transmute(baking_time,avg_time = (parse_number(first) + parse_number(second))/2)

使用基数R,一个选项是在使用gsub将非数字部分更改为分隔符后使用read.csv读取,获取rowMeans

rowMeans(read.csv(text=gsub("\\D+",",recipes$baking_time),header = FALSE)[-3])
#[1] 25 25 45 20 75 45

原文地址:https://www.jb51.cc/mssql/79111.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐