如何解决R中特定列的总和
我有这样的数据:
data_in <- read_table2("Id Q62_1 Q62_2 Q3_1 Q3_2 Q3_3 Q3_4 Q3_5
1 Yes Sometimes
2 Always
3
4 No Always Yes
5
6 Always No Likely Yes Always Always
7 Yes Sometimes Maybe Unlikely Sometimes Sometimes
8 Always Yes Likely No Always Always
9 Sometimes Unlikely Sometimes Sometimes
10 No No Likely Maybe
11 Sometimes Maybe Unlikely Sometimes Sometimes
12 Always Yes Likely Always Always
")
我想计算以Q62开头,然后分别从Q3_1到Q3_5列的缺失响应的数量。
我知道rowSums可以方便地对数字变量求和,但是是否有dplyr / piped等效于对na求和?
例如,如果这是数字数据,并且我想对q62系列进行求和,则可以使用以下内容:
data_in %>%
mutate(Q62_NA = rowSums(select(.,"Q62_1","Q62_2"))
但是我如何总结NA?
我的输出应如下所示:
data_out <- read_table2("Id Q62_1 Q62_2 Q3_1 Q3_2 Q3_3 Q3_4 Q3_5 Q62_NA Q3_NA
1 Yes Sometimes 0 5
2 Always 1 5
3 2 5
4 No Always Yes 0 5
5 2 5
6 Always No Likely Yes Always Always 1
7 Yes Sometimes Maybe Unlikely Sometimes Sometimes 0 1
8 Always Yes Likely No Always Always 1 0
9 Sometimes Unlikely Sometimes Sometimes 1 1
10 No No Likely Maybe 1 2
11 Sometimes Maybe Unlikely Sometimes Sometimes 1 1
12 Always Yes Likely Always Always 1 1
")
谢谢!
解决方法
这是基本的R选项
transform(
data_in,Q62_NA = rowSums(is.na(data_in[grepl("Q62",names(data_in))])),Q3_NA = rowSums(is.na(data_in[grepl("Q3",names(data_in))]))
)
给出
Id Q62_1 Q62_2 Q3_1 Q3_2 Q3_3 Q3_4 Q3_5 Q62_NA
1 1 Yes Sometimes <NA> <NA> <NA> <NA> <NA> 0
2 2 Always <NA> <NA> <NA> <NA> <NA> <NA> 1
3 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2
4 4 No Always Yes <NA> <NA> <NA> <NA> 0
5 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 2
6 6 Always No Likely Yes Always Always <NA> 0
7 7 Yes Sometimes Maybe Unlikely Sometimes Sometimes <NA> 0
8 8 Always Yes Likely No Always Always <NA> 0
9 9 Sometimes Unlikely Sometimes Sometimes <NA> <NA> <NA> 0
10 10 No No Likely Maybe <NA> <NA> <NA> 0
11 11 Sometimes Maybe Unlikely Sometimes Sometimes <NA> <NA> 0
12 12 Always Yes Likely Always Always <NA> <NA> 0
Q3_NA
1 5
2 5
3 5
4 4
5 5
6 1
7 1
8 1
9 3
10 3
11 2
12 2
,
我们可以将select
与is.na
进行包装,以将其转换为逻辑matrix
,然后在该矩阵上进行rowSums
以将每行TRUE元素的数量相加
library(dplyr)
data_in %>%
mutate(Q62_NA = rowSums(is.na(select(.,"Q62_1","Q62_2"))))
或带有c_across
和rowwise
的选项
data_in %>%
rowwise %>%
mutate(Q62_NA = sum(is.na(c_across(starts_with('Q6')))))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。