查找在LDA中已分配数据的三个类别的平均值

如何解决查找在LDA中已分配数据的三个类别的平均值

我正在看示例代码,该代码基于Q = W ^ {-1} B来计算Fisher的LDA。

数据导入如下:

aircraft = read_csv(file = "aircraft.csv") %>%
  mutate( Period = factor( Period ))

对于费舍尔的LDA,我有以下示例。使用solve(W,B)计算Q,然后找到Q的第一个特征向量,然后找到分配的类:

W1 = cov( dplyr::select( dplyr::filter( aircraft,Period == 1 ),-Year,-Period ) )
W2 = cov( dplyr::select( dplyr::filter( aircraft,Period == 2 ),-Period ) )
W3 = cov( dplyr::select( dplyr::filter( aircraft,Period == 3 ),-Period ) )

W = W1 + W2 + W3

mu1 = colMeans( dplyr::select( dplyr::filter( aircraft,-Period ) ) 
mu2 = colMeans( dplyr::select( dplyr::filter( aircraft,-Period ) ) 
mu3 = colMeans( dplyr::select( dplyr::filter( aircraft,-Period ) )  
mu = rbind( mu1,mu2,mu3 )

B = ( 3 - 1 ) * cov (mu ) 

Q = solve( W,B )

eta = eigen( Q )$vectors[,1]

XX = dplyr::select( aircraft,-Period ) 

XXproj = as.matrix( XX ) %*% as.matrix( eta ) 
muP1 = t( as.matrix( mu1 ) ) %*% as.matrix( eta )
muP2 = t( as.matrix( mu2 ) ) %*% as.matrix( eta )
muP3 = t( as.matrix( mu3 ) ) %*% as.matrix( eta )

tXXproj = t( XXproj ) 
m1 = as.data.frame( tXXproj )  - muP1 
m2 = as.data.frame( tXXproj )  - muP2
m3 = as.data.frame( tXXproj )  - muP3
mm = rbind( abs( m1 ),abs( m2 ),abs( m3 ) ) 
classes = sapply( mm,which.min ) 

classified = data.frame( assigned = classes,aircraft )

xtabs( ~ assigned + Period,data = classified )

命令str(classified)产生以下输出:

'data.frame':   709 obs. of  9 variables:
 $ assigned: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Year    : int  14 14 14 15 15 15 15 16 16 16 ...
 $ Period  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ Power   : num  82 82 224 164 119 ...
 $ Span    : num  12.8 11 17.9 14.5 12.9 ...
 $ Length  : num  7.6 9 10.3 9.8 7.9 ...
 $ Weight  : num  1070 830 2200 1946 1190 ...
 $ Speed   : int  105 145 135 138 140 177 113 230 175 106 ...
 $ Range   : int  400 402 500 500 400 350 402 700 525 300 ...

我想找到已分配数据的三个类别的平均值。听起来应该很简单;但是,我对R缺乏经验,不确定如何做到这一点。我认为applyselect函数与这种情况有关,但是我不确定。

我能够使用相关的R函数来实现自己的LDA:

lda.0 = lda( Period ~ Power + Span + Length + Weight + Speed + Range,data = aircraft )
preds.0 = predict( lda.0 )$class
xtabs( ~ preds.0 + aircraft$Period )

我想对我的实现执行与上述示例相同的操作(查找三个类的方法)。

命令str(predict( lda.0 ))产生以下输出:

List of 3
 $ class    : Factor w/ 3 levels "1","3": 1 1 1 1 1 1 1 1 1 1 ...
 $ posterior: num [1:709,1:3] 0.712 0.659 0.67 0.665 0.69 ...
  ..- attr(*,"dimnames")=List of 2
  .. ..$ : chr [1:709] "1" "2" "3" "4" ...
  .. ..$ : chr [1:3] "1" "2" "3"
 $ x        : num [1:709,1:2] -1.469 -0.988 -1.504 -1.22 -1.385 ...
  ..- attr(*,"dimnames")=List of 2
  .. ..$ : chr [1:709] "1" "2" "3" "4" ...
  .. ..$ : chr [1:2] "LD1" "LD2"

那么找到这两种情况下数据已分配给三个类别的均值的一种好方法是什么?

完整的数据集太大而不能包含在这篇文章中,因此我包括了一个较小版本的数据集:

structure(list(Year = c(14L,14L,15L,16L,17L,18L,19L,20L,21L,22L,23L,24L,25L,26L,27L,28L,28L),Period = c(1L,3L,1L,2L,2L),Power = c(82,82,223.6,164,119,74.5,279.5,67,112,149,238.5,205,194,336,558.9,287,388,186.3,89.4,126.7,536.6,402,298,342.8,536,521.6,335.3,357.7,313,782.6,670.6,223.5,391,436,171.4,350,634,864.4,760,503.5,63.3,812,317,231,432,918,745.2,424.8,372.6,782,626,544,373,391.2,864,894,179,391.2),Span = c(12.8,11,17.9,14.5,12.9,7.5,11.13,14.3,7.8,11.7,12.8,8.5,13.3,14.9,12,9.4,15.95,16.74,22.2,23.4,23.72,11.9,14.4,9.7,8,14.55,9.1,8.11,9.5,20.73,22.8,38.4,14,26.5,30.48,15.5,14.17,10.1,14.8,15.62,14.05,15.24,12.24,27.2,8.84,22.86,7.7,9.8,15.93,13.08,15.21,8.94,9.6,10.8,13.72,8.9,26.72,25,11.58,17.3,12.5,12.1,12.09,15.3,9.08,17.75,15.15,27.4,22,13.7,10.3,22.76,22.25,17.25,14.15,20.4,11.35),Length = c(7.6,9,10.35,7.9,6.3,8.28,6.7,8.3,8.7,7.4,6.2,10.25,10.77,10.9,12.6,11.86,9.2,6.5,6.95,9.83,7.3,6.38,13.27,13.5,20.85,14.33,19.16,8.1,9.68,11.89,10.97,11.28,11.42,18.2,7.01,18.08,6.8,7.1,11.5,9.27,9.78,6.17,6.4,7.32,10.74,6.9,18.97,15.1,7.06,7.17,10.55,8.38,8.81,9.42,5.99,10.27,10.22,19.8,14.63,11.2,6.56,14.88,13.81,7,7.2,9.91,15,8.94),Weight = c(1070,830,2200,1946,1190,653,930,1575,676,920,1353,1550,888,1275,1537,1292,611,1350,1700,3312,4920,1510,3625,900,1665,1640,1081,625,932,1378,886,902,1070,5670,3636,12925,2107,4770,6060,1192,1900,1050,2155,1379,2858,3380,2290,2347,3308,2630,1333,10000,1351,6250,885,1531,1438,3820,1905,2646,1151,1266,2383,860,7983,6200,1484,567,1867,4350,1935,1823,2253,1487,2220,1244,2700,2280,3652,8165,5500,3568,1414,5875,5460,4310,1500,1795,1628,2449,6900,2102),Speed = c(105L,145L,135L,138L,140L,177L,113L,230L,175L,106L,170L,157L,183L,201L,209L,120L,152L,176L,190L,205L,196L,165L,146L,222L,159L,166L,158L,185L,226L,161L,251L,171L,206L,235L,245L,214L,180L,220L,237L,254L,169L,153L,261L,200L,246L,174L,319L,290L,233L,250L,255L,298L,198L,195L,300L,270L,297L,225L,212L,197L,296L),Range = c(400L,402L,500L,400L,350L,700L,525L,560L,550L,450L,600L,800L,547L,1770L,2365L,925L,1205L,580L,684L,563L,644L,885L,440L,557L,750L,3600L,805L,330L,628L,1640L,604L,1046L,1585L,650L,917L,515L,1110L,772L,1127L,850L,523L,900L,668L,1706L,1385L,1000L,902L,579L,1125L,1300L,660L,756L)),row.names = c(NA,100L
),class = "data.frame")

解决方法

鉴于我无法访问您的aircraft.csv,也没有提及您使用的lda功能,因此我使用MASS:lda为虹膜做了一些操作:

library(tidyverse)
library(MASS)

lda.0 = lda( Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,data = iris )
preds.0 = predict( lda.0 )$class
xtabs( ~ preds.0 + iris$Species )
str(predict( lda.0 ))
lda.0$means

在最后一次调用时,您将获得类的平均值。

您可以通过处理示例中“已分类”的data.frame来计算已分配类的均值:

classified %>% group_by(assigned) %>% summarize(meanSL=mean(Sepal.Length),meanSW=mean(Sepal.Width),meanPL=mean(Petal.Length),meanPW=mean(Petal.Width))

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res