如何解决在 r 中提取不同值的最快方法
我想重新创建在这篇文章中展示的提取排序唯一值的最快方法的示例:What is the fastest way to get a vector of sorted unique values from a data.table?
test_df <-
data.frame(
company = c(1,1,2,3)
)
unique_values = df[,logical(1),keyby = company]$company
但我不断收到错误:
[.data.frame
(df,keyby = company) 中的错误:未使用
参数(keyby = company)
编辑。请注意,我的问题的重点是使此特定方法起作用。对于实现目标的其他方法的建议,请遵循我参考的帖子。
解决方法
如果您正在寻找快速 unique
,请查看 kit::funique
:
setDTthreads(1)
microbenchmark::microbenchmark(
y[,logical(1),keyby = company]$company,unique(x$company),funique(x$company)
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# y[,keyby = company]$company 12.151625 12.436920 13.506817 12.58519 12.76036 97.318758 100 b
# unique(x$company) 12.932633 13.145706 13.717273 13.33529 14.54441 15.511965 100 b
# funique(x$company) 2.403889 2.659345 2.748425 2.72396 2.78017 3.507635 100 a
setDTthreads(4)
microbenchmark::microbenchmark(
y[,funique(x$company)
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# y[,keyby = company]$company 5.038178 5.144970 5.907699 5.210202 6.804902 12.671440 100 b
# unique(x$company) 12.961273 13.136794 13.700900 13.315550 14.256065 21.449808 100 c
# funique(x$company) 2.604594 2.667491 2.738920 2.717532 2.786240 3.115353 100 a
数据和图书馆:
set.seed(42)
n <- 1e6
company <- c("A","S","W","L","T","A","W")
item <- c("Thingy","Thingy","Widget","Grommit","Thingy")
sales <- c(120,140,160,180,200,120,200)
x <- data.frame(company = sample(company,n,TRUE),item = sample(item,sales = sample(sales,TRUE))
library(data.table)
y <- as.data.table(x)
library(kit)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。