如何解决在某些列中传播具有非唯一值的数据框
这是我正在处理的数据:
> data
Segment Product Value Key
1 non-domestic S1 517.50760307564053 Actuals Sales
2 non-domestic S2 1235.3088913918129 Actuals Sales
3 non-domestic S3 2141.6841816176966 Actuals Sales
4 domestic S1 -958.38836859580044 Actuals Sales
5 domestic S2 -1129.5593769492507 Actuals Sales
6 domestic S3 -137.68477107274975 Actuals Sales
7 non-domestic S1 -296.07559218703756 Quarter Sales
8 non-domestic S2 1092.0390648120747 Quarter Sales
9 non-domestic S3 1156.2866848179935 Quarter Sales
10 domestic S1 -1975.0222255105061 Quarter Sales
11 domestic S2 -2549.8125184965966 Quarter Sales
12 domestic S3 -2608.2434152116011 Quarter Sales
我试图将它展开以获得一个 6 行 4 列 (Segment,Product,Actuals Sales,Quarter Sales
) 且没有缺失值的表格
spread(data=data,key=Key,value=Value)
不幸的是,我得到的是这个。我了解这是因为列 Segment
和 Product
中存在非唯一值。
Segment Product Actuals Sales Quarter Sales
1 domestic S1 -958.38836859580044 <NA>
2 domestic S2 -1129.5593769492507 <NA>
3 domestic S3 -137.68477107274975 <NA>
4 domestic S1 <NA> -1975.0222255105061
5 domestic S2 <NA> -2549.8125184965966
6 domestic S3 <NA> -2608.2434152116011
7 non-domestic S1 517.50760307564053 <NA>
8 non-domestic S2 1235.3088913918129 <NA>
9 non-domestic S3 2141.6841816176966 <NA>
10 non-domestic S1 <NA> -296.07559218703756
11 non-domestic S2 <NA> 1092.0390648120747
12 non-domestic S3 <NA> 1156.2866848179935
你能帮我吗,我如何删除缺失的值并创建一个表,其中前两列中的值不重复?
这是可复制的示例:
> dput(data)
structure(list(Segment = c("non-domestic","non-domestic","domestic","non-domestic ","domestic ","domestic "),Product = c("S1","S2","S3","S1","S3"
),Value = c("517.50760307564053","1235.3088913918129","2141.6841816176966","-958.38836859580044","-1129.5593769492507","-137.68477107274975","-296.07559218703756","1092.0390648120747","1156.2866848179935","-1975.0222255105061","-2549.8125184965966","-2608.2434152116011"
),Key = c("Actuals Sales","Actuals Sales","Quarter Sales","Quarter Sales")),.Names = c("Segment","Product","Value","Key"),row.names = c(NA,-12L),class = "data.frame")
解决方法
删除不需要的空格 (trimws()
) 并将强制转换为宽
library(data.table)
dcast(setDT(mydata),trimws(Segment) + Product ~ Key,value.var = "Value",fill = NA)
# Segment Product Actuals Sales Quarter Sales
# 1: domestic S1 -958.38836859580044 -1975.0222255105061
# 2: domestic S2 -1129.5593769492507 -2549.8125184965966
# 3: domestic S3 -137.68477107274975 -2608.2434152116011
# 4: non-domestic S1 517.50760307564053 -296.07559218703756
# 5: non-domestic S2 1235.3088913918129 1092.0390648120747
# 6: non-domestic S3 2141.6841816176966 1156.2866848179935
,
使用 <input type="text" class="form-control" [ngModel]="username" (input)="onUpdateUsername($event)">
的基本 R 选项
reshape
给予
reshape(
transform(data,Segment = trimws(Segment)),direction = "wide",idvar = c("Segment","Product"),timevar = "Key"
)
,
您的示例数据实际上包含一些空格,在删除这些后,pivot_wider
及其参数 id_cols
就像一个魅力
data <- structure(list(Segment = c("non-domestic","non-domestic","domestic","domestic"),Product = c("S1","S2","S3","S1","S3"
),Value = c("517.50760307564053","1235.3088913918129","2141.6841816176966","-958.38836859580044","-1129.5593769492507","-137.68477107274975","-296.07559218703756","1092.0390648120747","1156.2866848179935","-1975.0222255105061","-2549.8125184965966","-2608.2434152116011"
),Key = c("Actuals Sales","Actuals Sales","Quarter Sales","Quarter Sales")),.Names = c("Segment","Product","Value","Key"),row.names = c(NA,-12L),class = "data.frame")
library(tidyr)
data %>% pivot_wider(names_from = Key,values_from = Value,id_cols = c(Segment,Product))
#> # A tibble: 6 x 4
#> Segment Product `Actuals Sales` `Quarter Sales`
#> <chr> <chr> <chr> <chr>
#> 1 non-domestic S1 517.50760307564053 -296.07559218703756
#> 2 non-domestic S2 1235.3088913918129 1092.0390648120747
#> 3 non-domestic S3 2141.6841816176966 1156.2866848179935
#> 4 domestic S1 -958.38836859580044 -1975.0222255105061
#> 5 domestic S2 -1129.5593769492507 -2549.8125184965966
#> 6 domestic S3 -137.68477107274975 -2608.2434152116011
不过,如果您的实际数据还包含空格,您可以在旋转之前使用 stringr::str_trim()
。
data <- structure(list(Segment = c("non-domestic","non-domestic ","domestic ","domestic "),class = "data.frame")
library(tidyverse)
data %>% mutate(Segment = str_trim(Segment)) %>%
pivot_wider(names_from = Key,Product))
#> # A tibble: 6 x 4
#> Segment Product `Actuals Sales` `Quarter Sales`
#> <chr> <chr> <chr> <chr>
#> 1 non-domestic S1 517.50760307564053 -296.07559218703756
#> 2 non-domestic S2 1235.3088913918129 1092.0390648120747
#> 3 non-domestic S3 2141.6841816176966 1156.2866848179935
#> 4 domestic S1 -958.38836859580044 -1975.0222255105061
#> 5 domestic S2 -1129.5593769492507 -2549.8125184965966
#> 6 domestic S3 -137.68477107274975 -2608.2434152116011
由 reprex package (v2.0.0) 于 2021 年 6 月 11 日创建
,我会用 data.table 包来做,然后生成 2 个表然后合并它们。
希望此代码对您有所帮助。
library(data.table)
#"test" is your data frame input
test <- data.table(test)
a <- test[Key=="ActualsSales",.(Segment=Segment,Product=Product,ActualsSales=Value)]
b <- test[Key=="QuarterSales",QuarterSales=Value)]
output <- merge(a,b,by=c("Segment","Product"))
print(output)
,
qs <- df$Value[df$Key == 'Quarter Sales']
as <- df$Value[df$Key == 'Actuals Sales']
df$QS <- c(qs,rep(NA,length(qs)))
df$AS <- c(as,length(as)))
df$Key <- NULL
df <- df[complete.cases(df),]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。