微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

当我知道行号时如何查找列名变量名

如何解决当我知道行号时如何查找列名变量名

我一直在对一些人口普查数据进行聚类分析。我制作了一个图形,其中显示了行名称以及它们被分配到的集群。我希望能够找出,例如:对于第 5 行,该行上变量 Region 的值是多少?附上我的图表...还附上我的代码enter image description here

library(dplyr)
library (arsenal)
library (cluster)
library (factoextra)
library (rattle)

#Read in the "Annual Population Estimates" data set
popest<- read.csv("nst-est2019-alldata.csv",header=TRUE)
attach(popest)
names(popest)
head(popest,2)
summary(popest)


#DATA PRE-PROCESSING
#Change popest variables SUMLEV,REGION,DIVISION,STATE to categorical variables
popest$SUMLEV <- as.factor(popest$SUMLEV)
popest$REGION <- as.factor(popest$REGION)
popest$DIVISION <- as.factor(popest$DIVISION)
popest$STATE <- as.factor(popest$STATE)


#change X to NA
popest2 <- na_if(popest,"X")

#Make sure numbers are part of set of real numbers
sapply(popest2,is.finite)

#Create a subset of data from the popest2 file that only contains 
#the population change values and the first five columns
#first figure out which column numbers I need
grep("^NPOPCHG",colnames(popest2))
cols.num <- c(18:27)
#then assign a vector of the variables that I want in the subset. Then build subset
myvars <- c(1:5,18:27)
newpopest2 <- popest2[myvars]

#Make NA equal to 0 so that I can scale the data
newpopest2[is.na(newpopest2)] <- 0

#first 5 factor variables removed and rows removed for records that are not technically States
newpopest2data <- newpopest2[-c(1,2,3,4,5,57),]
df <- newpopest2data[-c(1:4)]

#Scale the data
df <- scale(df[-1])
#head (newpopest2data)

#Determine how many clusters there should be
#create two different plots that can help us decide:
# 1. Number of Clusters vs. the Total Within Sum of Squares
#Use the fviz_nbclust() function to create a plot of the number of clusters vs. 
#the total within sum of squares:
fviz_nbclust(df,kmeans,method = "wss")

#make this example reproducible
set.seed(1)

#perform k-means clustering with k = 3 clusters
km <- kmeans(df,centers = 3,nstart = 25)
#view results
km
#plot results of final k-means model
fviz_cluster(km,data = df)

解决方法

正如@Elle 在评论中所建议的,您可以简单地df[5,]$Region。至少还有其他 base 方法:

df[5,"Region"]
df[5,which(colnames(df)=="Region")]#N.B. Here I use which() because I don't now the col.number of "Region" in your df

#if you Know the column number of "Region" you could simply do:
#(assuming "Region" is the 2nd column)
df[5,2]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。