高维数据 | R语言绘图基础之主成分分析
高维数据可视化之主成分分析
在视觉性方面,人类普遍能够感知的是二维和三维空间。对于高维数据的可视化是将高维数据投影到二维或三维空间,去掉冗余属性,同时保留高维空间的数据和特征。说白了,高维数据的可视化就是使用降维度方法,主要分成线性和非线性两大类,关于非线性的非度量多维尺度分析NMDS见往期文章,关于线性的PCA方法,见往期文章,本文主要针对迷弥小粉丝关于绘制线性PCA图数据处理过程遇到的问题进行记录。
01
原始数据
02
ggbiplot绘制
library(readr)Enzyme <- read_csv("~/Desktop/主成分分析/Enzyme.csv")#导入View(Enzyme)
:10]#预处理,选取Enzyme数据框中2到10列的数据形成新的矩阵df1.df1# A tibble: 36 x 9Time MDA LOX APX PAL CAT POD PPO TPC<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>1 40 41 19 58 80 186. 40 20 9652 40 42 20 55 155 172. 34 37 9553 40 43 17 51 92 166. 30 36 9374 40 30 23 57 208 163. 85 23 12615 40 34 24 58 180 146. 89 26 12496 40 38 24 52 99 167. 81 21 12697 40 40 27 55 150 70.7 81 16 13228 40 30 30 54 104 56.7 81 16 13129 40 43 29 55 167 73.7 82 22 135710 40 46 27 58 96 66.7 64 22 1080# … with 26 more rows<- prcomp(df1,scale. = TRUE)#对给定的数据矩阵进行主成分分析,并将结果作为类prcomp的对象返回。Enzyme1.pcaStandard deviations (1, .., p=9):[1] 2.0909293 1.1638511 1.0689045 0.9458037 0.7227202 0.6075112 0.5181985 0.2353166 0.1451410Rotation (n x k) = (9 x 9):PC1 PC2 PC3 PC4 PC5 PC6 PC7Time 0.46315503 -0.04916147 0.03637764 0.162115952 0.09831722 -0.05605495 0.15384446MDA 0.43051775 -0.04773755 -0.14795093 -0.069491134 -0.36795168 -0.34716366 -0.30304858LOX -0.09369929 0.32940027 -0.72371726 -0.441381496 -0.08318472 -0.00732021 0.37465711APX -0.08716532 -0.75858481 -0.10162202 -0.059575764 -0.45541633 0.34852207 0.26596939PAL 0.18847050 -0.30570836 0.24096297 -0.784312315 0.43453792 -0.08163434 -0.03770386CAT -0.37539362 -0.03778714 0.33391251 -0.026114784 -0.18066354 -0.69437542 0.46568301POD 0.41870137 0.00242730 0.02591931 0.254736824 0.31195985 0.10630817 0.65436944PPO 0.44279023 -0.08209132 -0.12802074 -0.008043973 -0.27235413 -0.34781106 -0.01258019TPC -0.18960223 -0.45795031 -0.50603522 0.299564950 0.49765782 -0.36567563 -0.16354171PC8 PC9Time -0.423321246 0.73470281MDA -0.488029855 -0.44691662LOX -0.086060923 0.09687049APX -0.054174793 0.02299334PAL 0.013325895 -0.02694037CAT -0.113141408 0.03184374POD 0.038176838 -0.46993978PPO 0.746837400 0.16634344TPC -0.009080942 -0.01764606
> library(ggbiplot)>ggbiplot(Enzyme1.pca,obs.scale = 1,var.scale = 1,groups =Enzyme$Treament,ellipse = TRUE,circle = TRUE)+scale_color_discrete(name="")+theme(legend.direction = 'horizontal',legend.position = 'top')
03
fviz_pca_ind绘制-1
df<-Enzyme[c(2,3,4,5,6,7,8,9,10)]View(df)
> Enzyme.pca<-PCA(df,graph=FALSE)> Enzyme.pca**Results for the Principal Component Analysis (PCA)**The analysis was performed on 36 individuals, described by 9 variables*The results are available in the following objects:name description1 "$eig" "eigenvalues"2 "$var" "results for the variables"3 "$var$coord" "coord. for the variables"4 "$var$cor" "correlations variables - dimensions"5 "$var$cos2" "cos2 for the variables"6 "$var$contrib" "contributions of the variables"7 "$ind" "results for the individuals"8 "$ind$coord" "coord. for the individuals"9 "$ind$cos2" "cos2 for the individuals"10 "$ind$contrib" "contributions of the individuals"11 "$call" "summary statistics"12 "$call$centre" "mean of the variables"13 "$call$ecart.type" "standard error of the variables"14 "$call$row.w" "weights for the individuals"15 "$call$col.w" "weights for the variables"
library(factoextra)library(FactoMineR)fviz_pca_ind(Enzyme.pca,geom.ind="point",pointsize=3,pointshape=21,fill.ind=Enzyme$Treament,palette=c("#00AFBB","#E7B800","#FC4E07","blue"),addEllipses=TRUE,legend.titl="Groups",title="")+theme_grey()
04
fviz_pca_ind绘制-2
fviz_pca_ind(Enzyme2.pca,geom.ind="point",pointsize=4,pointshape=21,fill.ind=Enzyme$Time,palette=c("#00AFBB","#E7B800","#FC4E07"),addEllipses=TRUE,legend.titl="Groups",title="")+theme_grey()错误: Continuous value supplied to discrete scale
>str(Enzyme$Time)num [1:36] 40 40 40 40 40 40 40 40 40 40 ...#查看数据框中某列的数值类型,你会发现是num数值型,而fill.ind=Enzyme$Time这列数据需要因子向量型。>as.character(Enzyme$Time)[1] "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "80" "80" "80"[16] "80" "80" "80" "80" "80" "80" "80" "80" "80" "120" "120" "120" "120" "120" "120"[31] "120" "120" "120" "120" "120" "120"#将Time所在列的数据转换为字符型的因子向量。>Enzyme$Time<-as.character(Enzyme$Time)#将数据框中的某列的数值转换为因子向量,变量名仍为Time>str(Enzyme$Time)chr [1:36] "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "40" "80" "80" "80" "80" ...#再次查看,Time的类型,发现修改成功,变成字符型chr了。
> library(factoextra)> library(FactoMineR)> df2<-Enzyme[,3:10]> df2# A tibble: 36 x 8MDA LOX APX PAL CAT POD PPO TPC<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>1 41 19 58 80 186. 40 20 9652 42 20 55 155 172. 34 37 9553 43 17 51 92 166. 30 36 9374 30 23 57 208 163. 85 23 12615 34 24 58 180 146. 89 26 12496 38 24 52 99 167. 81 21 12697 40 27 55 150 70.7 81 16 13228 30 30 54 104 56.7 81 16 13129 43 29 55 167 73.7 82 22 135710 46 27 58 96 66.7 64 22 1080# … with 26 more rows> Enzyme2.pca<-PCA(df2,graph=FALSE)> Enzyme2.pca**Results for the Principal Component Analysis (PCA)**The analysis was performed on 36 individuals, described by 8 variables*The results are available in the following objects:name description1 "$eig" "eigenvalues"2 "$var" "results for the variables"3 "$var$coord" "coord. for the variables"4 "$var$cor" "correlations variables - dimensions"5 "$var$cos2" "cos2 for the variables"6 "$var$contrib" "contributions of the variables"7 "$ind" "results for the individuals"8 "$ind$coord" "coord. for the individuals"9 "$ind$cos2" "cos2 for the individuals"10 "$ind$contrib" "contributions of the individuals"11 "$call" "summary statistics"12 "$call$centre" "mean of the variables"13 "$call$ecart.type" "standard error of the variables"14 "$call$row.w" "weights for the individuals"15 "$call$col.w" "weights for the variables"> fviz_pca_ind(Enzyme2.pca,geom.ind="point",pointsize=4,pointshape=21,fill.ind=Enzyme$Time,palette=c("#00AFBB","#E7B800","#FC4E07"),addEllipses=TRUE,legend.titl="Groups",title="")+theme_grey()
小结
ggbiplot:是ggplot2中绘制主成分的bioplot
groups:指可选因素的变量,即观测值所属的组。
prcomp():对给定的数据矩阵进行主成分分析,并将结果作为类prcomp的对象返回。
PCA():是FactoMineR包中主成分分析函数,能够对数据进行降维处理。
fviz_pca_ind():是factoextra包中的函数,能够以散点的形式展现数据分析结果。
我知道你 在看 哦
