微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

通过基于选择的联合来确定客户偏好

如何解决通过基于选择的联合来确定客户偏好

我收到了有关不同品牌的糖果成分的数据集,以及有关以百分比表示的价格,糖和以百分比表示的利润的信息。成分信息是虚拟变量,其中0表示不存在该特性,而1表示存在该特性。目标是选择一种统计方法来确定消费者的喜好并预测新产品。我想在R中实现此解决方案。我的想法是执行基于选择的联合分析。首先,我计算了变量的列总和并得到了摘要。由于我在数据集中有虚拟变量和数字变量,因此出现了一个问题,即所有变量是否必须具有相同的数据类型?我也没有问题,没有任何受访者表示自己的选择。我只有糖果成分,糖百分比,价格和获胜百分比的特征可以作为潜在选择。基于选择的联合分析的进一步步骤是什么?

dput(rbind(head(cbc.df,10),tail(cbc.df,10)))
structure(list(competitorname = c("100 Grand","3 Musketeers","One dime","One quarter","Air Heads","Almond Joy","Baby Ruth","Boston Baked Beans","Candy Corn","Caramel Apple Pops","Tootsie Roll Juniors","Tootsie Roll Midgies","Tootsie Roll Snack Bars","Trolli Sour Bites","Twix","Twizzlers","Warheads","WelchÕs Fruit Snacks","WertherÕs Original Caramel","Whoppers"),chocolate = c(1L,1L,0L,1L),fruity = c(0L,0L),caramel = c(1L,peanutyalmondy = c(0L,nougat = c(0L,crispedricewafer = c(1L,hard = c(0L,bar = c(1L,pluribus = c(0L,sugarpercent = c(0.73199999,0.60399997,0.011,0.90600002,0.465,0.31299999,0.17399999,0.546,0.22,0.093000002,0.186,0.87199998),pricepercent = c(0.86000001,0.51099998,0.116,0.76700002,0.32499999,0.255,0.26699999,0.84799999
),winpercent = c("66.971.725","67.602.936","32.261.086","46.116.505","52.341.465","50.347.546","56.914.547","23.417.824","38.010.963","34.517.681","43.068.897","45.736.748","49.653.503","47.173.229","81.642.914","45.466.282","39.011.898","44.375.519","41.904.308","49.524.113")),row.names = c(1L,2L,3L,4L,5L,6L,7L,8L,9L,10L,77L,78L,79L,80L,81L,82L,83L,84L,85L,86L),class = "data.frame")

摘要(cbc.df)

competitorname       chocolate          fruity          caramel       peanutyalmondy       nougat          crispedricewafer
Length:86          Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
Class :character   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st  Qu.:0.0000   1st Qu.:0.0000  
Mode  :character   Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000   Median  :0.0000   Median :0.0000  
Mean   :0.4302   Mean   :0.4535   Mean   :0.1628   Mean   :0.1628   Mean   :0.0814   Mean   :0.0814  
3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000  
Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
hard             bar            pluribus       sugarpercent     pricepercent     winpercent       
Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0110   Min.   :0.0110   Length:86         
1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.2200   1st Qu.:0.2580   Class :character  
Median :0.0000   Median :0.0000   Median :1.0000   Median :0.4650   Median :0.4650   Mode   :character  
Mean   :0.1744   Mean   :0.2442   Mean   :0.5233   Mean   :0.4736   Mean   :0.4672                     
3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:1.0000   3rd Qu.:0.7320   3rd Qu.:0.6510                     
Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :0.9880   Max.   :0.9760  

列总和

可变巧克力:0个无巧克力,1个巧克力

dplyr :: count(cbc.df,巧克力)

     0    49
     1    37

可变果味:0不果糖,1果味

dplyr :: count(cbc.df,果味)

  0    47
  1    39

可变焦糖:0不焦糖,1焦糖

dplyr :: count(cbc.df,焦糖色)

   0    72
   1    14

可变花生杏仁:0个非花生杏仁,1个花生杏仁

dplyr :: count(cbc.df,peanutyalmondy)

          0    72
          1    14

可变牛轧糖:0个非牛轧糖,1个牛轧糖

dplyr :: count(cbc.df,牛轧糖)

  0    79
  1     7

可变的cristedricewafer:0个非cristedricewafer,1个cristedricewafer

dplyr :: count(cbc.df,crispedricewafer)

          0    79
          1     7

硬变量:0不难,1哈特

dplyr :: count(cbc.df,hard)

 0    71
 1    15

可变条形:0不为条形,为1 bar

dplyr :: count(cbc.df,bar)

0    65
1    21

可变多发性结肠炎:0个非多发性结肠炎,1个多发性结肠炎

dplyr :: count(cbc.df,pluribus)

   0    41
   1    45
   

列总糖百分比

sugar_sum <- sum(cbc.df$sugarpercent)

40.731

列总价的百分比

price_sum <- sum(cbc.df$pricepercent)

40.18

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。