从 R 中的交叉表添加可能事件及其概率的最佳方法

如何解决从 R 中的交叉表添加可能事件及其概率的最佳方法

使用 mtcars 数据集，我创建了一个交叉表如下 -

tab = with(mtcars,ftable(gear,cyl))
tab

这是它的样子 -

     cyl  4  6  8
gear             
3         1  2 12
4         8  4  0
5         2  1  2

对于这个交叉表，我已经计算了行概率

tab_prob = tab %>% prop.table(1) %>% round(4) * 100
tab_prob
     cyl     4     6     8
gear                      
3         6.67 13.33 80.00
4        66.67 33.33  0.00
5        40.00 20.00 40.00

我想向原始 mtcars 数据集添加两列

第 1 列 cyl_exp - 根据交叉表填写预期结果。例如，在 mtcars 数据集中，如果齿轮数为 3，则此新列（请参阅 tab 交叉表）应具有值 8，因为有是 80% 概率，如果齿轮数为 3，则 cyl 应为 8。
第 2 列 cyl_prob - 根据 tab_prob 列中的值，将表 cyl_exp 中的概率写入此列。

这是预期的结果 -

head(mtcars)
    mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_prob cyl_exp
1: 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    66.67       4
2: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    66.67       4
3: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1    66.67       4
4: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    80.00       8
5: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    80.00       8
6: 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    80.00       8

有没有一种简单的方法可以做到这一点？

谢谢！

解决方法

这是在 dplyr 中执行此操作的一种方法：

library(dplyr)

mtcars %>%
  count(cyl_exp = cyl,gear,name = 'cyl_prob') %>%
  group_by(gear) %>%
  mutate(cyl_prob = prop.table(cyl_prob) * 100) %>%
  slice(which.max(cyl_prob)) %>%
  inner_join(mtcars,by = 'gear')

#  cyl_exp  gear cyl_prob   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  carb
#     <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1       8     3       80  21.4     6  258    110  3.08  3.22  19.4     1     0     1
# 2       8     3       80  18.7     8  360    175  3.15  3.44  17.0     0     0     2
# 3       8     3       80  18.1     6  225    105  2.76  3.46  20.2     1     0     1
# 4       8     3       80  14.3     8  360    245  3.21  3.57  15.8     0     0     4
# 5       8     3       80  16.4     8  276.   180  3.07  4.07  17.4     0     0     3
# 6       8     3       80  17.3     8  276.   180  3.07  3.73  17.6     0     0     3
# 7       8     3       80  15.2     8  276.   180  3.07  3.78  18       0     0     3
# 8       8     3       80  10.4     8  472    205  2.93  5.25  18.0     0     0     4
# 9       8     3       80  10.4     8  460    215  3     5.42  17.8     0     0     4
#10       8     3       80  14.7     8  440    230  3.23  5.34  17.4     0     0     4
# … with 22 more rows

我以长格式保存数据，以便更容易加入。答案的第一部分用于创建交叉表。

mtcars %>%
  count(cyl_exp = cyl,name = 'cyl_prob') %>%
  group_by(gear) %>%
  mutate(cyl_prob = prop.table(cyl_prob) * 100)

#  cyl_exp  gear cyl_prob
#    <dbl> <dbl>    <dbl>
#1       4     3     6.67
#2       4     4    66.7 
#3       4     5    40   
#4       6     3    13.3 
#5       6     4    33.3 
#6       6     5    20   
#7       8     3    80   
#8       8     5    40

从这里我们只保留每个 gear 的概率最高的行并连接数据。

我使用了普通的 table 和 prop.table 而不是 PdfReader reader = new PdfReader(rs.getBytes("Doc_Data")); // input PDF in Bytes ByteArrayOutputStream baos = new ByteArrayOutputStream(); PdfStamper stamper = new PdfStamper(reader,baos); // output PDF Paragraph paragraph = new Paragraph("Some Text to P"); paragraph.setAlignment(Element.ALIGN_RIGHT); stamper.getWriter().open(); stamper.getWriter().add(paragraph); System.out.println(stamper.getWriter().add(paragraph)); stamper.getWriter().close();。我提出以下解决方案：

ftable

输出：

df <- mtcars

tab=table(mtcars$gear,mtcars$cyl)
tab_prob = round(prop.table(tab,margin=1)*100,2)

exp_cyl <- function(x){
  return(as.numeric(names(which.max(tab[toString(x),]))))
}

prob_cyl <- function(x){
  return(round(max(tab_prob[toString(x),]),2))
}

df <- mtcars
df %>% mutate(cyl_prob=sapply(gear,prob_cyl),cyl_exp=sapply(gear,exp_cyl))

使用 data.table，我会这样做：

mtcars <- as.data.table(mtcars,keep.rownames = T)

tab <- mtcars[,.N,by = .(gear,cyl)]
tab[,prob := N/sum(N),by = .(gear)]
tab <- tab[order(-prob,cyl)][!duplicated(gear)]
mtcars[tab,`:=`(cyl_exp = i.cyl,cyl_prob = i.prob),on = .(gear)]

# > head(mtcars)
#                   rn  mpg cyl disp  hp drat    wt  qsec vs am gear carb cyl_exp  cyl_prob
# 1:         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4       4 0.6666667
# 2:     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4       4 0.6666667
# 3:        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1       4 0.6666667
# 4:    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1       8 0.8000000
# 5: Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2       8 0.8000000
# 6:           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       8 0.8000000