如何解决使用 tidyverse mutate
我的数据框包含 na.omit() 无法检测到的条目“n/a”。我知道 tidyverse 包中的 mutate 函数能够将“n/a”条目转换为适当的空值并被删除。这是我尝试过但发生错误的方法:替换错误(值,值==“n/a”,NA):未找到对象“值”。提前致谢!
> head(data)
customer_id gender past_3_years_bike_related_purchases dob
1 1 F 93 19644
2 2 Male 81 29571
3 5 Female 56 28258
4 8 Male 31 22735
5 9 Female 97 26733
6 12 Male 58 34536
job_industry_category wealth_segment owns_car tenure state
1 Health Mass Customer Yes 11 New South Wales
2 Financial Services Mass Customer Yes 16 New South Wales
3 n/a Affluent Customer Yes 8 New South Wales
4 n/a Mass Customer No 7 New South Wales
5 Argiculture Affluent Customer Yes 8 New South Wales
6 Manufacturing Mass Customer No 8 QLD
data %>%
mutate(value = replace(value,value == "n/a",NA)) %>%
drop_na()
解决方法
您需要使用要检测“n/a”值的列名。
library(dplyr)
library(tidyr)
data %>%
mutate(value = replace(job_industry_category,job_industry_category == "n/a",NA)) %>%
drop_na()
您也可以在不将这些值转换为实际 NA
的情况下执行此操作。
data %>% filter(job_industry_category != "n/a")
#Base R :
subset(data,job_industry_category != "n/a")
,
当使用 n/a
参数将数据读入 R 时,na.omit()
值也可以转换为适用于 na.strings()
的值。
例如,如果我们从原始帖子中获取数据并将其转换为管道分隔值文件,我们可以使用 na.strings()
将 n/a
作为缺失值包含在 read.csv()
,然后使用 na.omit()
对数据进行子集化。
textData <- "customer_id|gender|past_3_years_bike_related_purchases|DOB|job_industry_category|wealth_segment|owns_car|tenure|state
1|Female| 93|19644|Health|Mass Customer|Yes|11|New South Wales
2|Male| 81|29571|Financial Services|Mass Customer|Yes|16|New South Wales
5|Female| 56|28258|n/a|Affluent Customer|Yes|8|New South Wales
8|Male| 31|22735|n/a|Mass Customer| No|7|New South Wales
9|Female| 97|26733|Argiculture|Affluent Customer|Yes| 8|New South Wales
12|Male| 58|34536|Manufacturing|Mass Customer| No| 8|QLD"
data <- read.csv(text = textData,header = TRUE,na.strings = c("n/a","na"),sep="|")
data
> data
customer_id gender past_3_years_bike_related_purchases DOB job_industry_category
1 1 Female 93 19644 Health
2 2 Male 81 29571 Financial Services
3 5 Female 56 28258 <NA>
4 8 Male 31 22735 <NA>
5 9 Female 97 26733 Argiculture
6 12 Male 58 34536 Manufacturing
wealth_segment owns_car tenure state
1 Mass Customer Yes 11 New South Wales
2 Mass Customer Yes 16 New South Wales
3 Affluent Customer Yes 8 New South Wales
4 Mass Customer No 7 New South Wales
5 Affluent Customer Yes 8 New South Wales
6 Mass Customer No 8 QLD
正如我们从输出中看到的,第 3 行和第 4 行现在有 <NA>
代表 job_industry_category
。
# now omit missing values
na.omit(data)
...现在从数据框中删除具有 <NA>
值的行。
> na.omit(data)
customer_id gender past_3_years_bike_related_purchases DOB job_industry_category
1 1 Female 93 19644 Health
2 2 Male 81 29571 Financial Services
5 9 Female 97 26733 Argiculture
6 12 Male 58 34536 Manufacturing
wealth_segment owns_car tenure state
1 Mass Customer Yes 11 New South Wales
2 Mass Customer Yes 16 New South Wales
5 Affluent Customer Yes 8 New South Wales
6 Mass Customer No 8 QLD
,
我们可以使用 na_if
将元素转换为 NA
并使用 drop_na
library(dplyr)
library(tidyr)
data %>%
mutate(value = na_if(job_industry_category,"n/a")) %>%
drop_na()
,
data.table
选项
> setDT(df)[!"n/a",on = .(job_industry_category)]
customer_id gender past_3_years_bike_related_purchases DOB
1: 1 Female 93 19644
2: 2 Male 81 29571
3: 9 Female 97 26733
4: 12 Male 58 34536
job_industry_category wealth_segment owns_car tenure state
1: Health Mass Customer Yes 11 New South Wales
2: Financial Services Mass Customer Yes 16 New South Wales
3: Argiculture Affluent Customer Yes 8 New South Wales
4: Manufacturing Mass Customer No 8 QLD
数据
> dput(df)
structure(list(customer_id = c(1L,2L,5L,8L,9L,12L),gender = c("Female","Male","Female","Male"),past_3_years_bike_related_purchases = c(93L,81L,56L,31L,97L,58L),DOB = c(19644L,29571L,28258L,22735L,26733L,34536L),job_industry_category = c("Health","Financial Services","n/a","Argiculture","Manufacturing"),wealth_segment = c("Mass Customer","Mass Customer","Affluent Customer","Mass Customer"),owns_car = c("Yes","Yes"," No"," No"),tenure = c(11L,16L,7L,8L),state = c("New South Wales","New South Wales","QLD")),class = "data.frame",row.names = c(NA,-6L))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。