vlambda博客
学习文章列表

高维数据 |R语言数据可视化之t-SNE

高维数据可视化之t-SNE算法



    t-SNE算法是最近开发的一种降维的非线性算法,也是一种机器学习算法。与PCA一样是非常适合将高维度数据降低至二维或三维的一种方法,不同之处是PCA属于线性降维,不能解释复杂多项式之间的关系,而t-SNE是根据t分布随机领域的嵌入找到数据之间的结构特点。


01

原始数据

    #原始数据为iris数据框,是来自鸢尾属、花斑科和维珍属的50朵花的萼片长度和宽度以及花瓣长度和宽度的测量值,包含150行,5个变量的部分数据截图如下:


02

降维处理

> iris_unique<-unique(iris)#去除重复值> set.seed(42)> iris1<-as.matrix(iris_unique[,1:4])#选取1至4列数据构成矩阵。> tsne_out<-Rtsne(iris1)#c++实现Barnes-Hut t-分布式随机邻居嵌入的封装器,通过设置theta=0.0可以计算出t-SNE的准确值,降维全靠Rtsne()函数。> tsne_out$N[1] 149$Y [,1] [,2] [1,] -15.794362 -6.776711 [2,] -18.120432 -6.231470 [3,] -18.261085 -7.311696 [4,] -18.520943 -7.087130 [5,] -15.778549 -7.221474 [6,] -14.008673 -7.269801 [7,] -17.893455 -7.813756 [8,] -16.467086 -6.810327 [9,] -19.221721 -6.978521 [10,] -17.803392 -6.466310 [11,] -14.445038 -6.594416 [12,] -17.100410 -7.344027 [13,] -18.414497 -6.464928 [14,] -19.408668 -7.464577 [15,] -13.275994 -6.701498 [16,] -13.149006 -7.122149 [17,] -13.890901 -6.929687 [18,] -15.771478 -6.789918 [19,] -13.700649 -6.432417 [20,] -14.852568 -7.405885 [21,] -15.093413 -5.868684 [22,] -15.141230 -7.327606 [23,] -17.943436 -8.557303 [24,] -16.259062 -5.898315 [25,] -16.997972 -7.907640 [26,] -17.825508 -5.984987 [27,] -16.383265 -6.823896 [28,] -15.438452 -6.609476 [29,] -15.772932 -6.342601 [30,] -17.844418 -7.233536 [31,] -17.907978 -6.804983 [32,] -15.149819 -5.939007 [33,] -13.898201 -7.644515 [34,] -13.421065 -7.250096 [35,] -17.747953 -6.510444 [36,] -17.257790 -6.137830 [37,] -14.564020 -6.003143 [38,] -16.193628 -7.512172 [39,] -19.165844 -7.239552 [40,] -16.147220 -6.566834 [41,] -16.104934 -7.127514 [42,] -19.608322 -6.371928 [43,] -18.891659 -7.666698 [44,] -15.795457 -7.824185 [45,] -14.644010 -8.070843 [46,] -18.380163 -6.474333 [47,] -14.788982 -7.480182 [48,] -18.361689 -7.377014 [49,] -14.677585 -6.777023 [50,] -16.850026 -6.588789 [51,] 8.536314 1.331853 [52,] 7.160654 2.222958 [53,] 8.845708 1.642313 [54,] 2.600714 2.678237 [55,] 7.608379 2.089299 [56,] 4.581609 3.175618 [57,] 7.227995 2.925564 [58,] 1.259600 2.278412 [59,] 7.658976 1.699396 [60,] 2.519998 3.124515 [61,] 1.356245 2.444995 [62,] 4.855057 2.662532 [63,] 3.194511 1.282723 [64,] 6.573616 2.933744 [65,] 2.654406 1.830627 [66,] 7.646466 1.447916 [67,] 4.740423 3.566360 [68,] 3.545194 2.042330 [69,] 5.763106 1.039076 [70,] 2.780636 2.251854 [71,] 7.123596 4.273298 [72,] 4.397361 1.856958 [73,] 8.586749 3.297060 [74,] 6.268760 2.630621 [75,] 6.569283 1.767931 [76,] 7.376409 1.575341 [77,] 8.456851 1.726707 [78,] 9.372167 2.305879 [79,] 5.871143 2.931851 [80,] 2.242621 1.746196 [81,] 2.373290 2.304550 [82,] 2.174285 2.161862 [83,] 3.296379 2.057194 [84,] 8.559505 4.180703 [85,] 4.351523 3.790885 [86,] 6.254539 3.630555 [87,] 8.222644 1.732763 [88,] 5.693089 1.089601 [89,] 3.861217 2.855112 [90,] 2.883691 2.648004 [91,] 3.519009 3.300865 [92,] 6.352661 2.839296 [93,] 3.341238 2.093332 [94,] 1.310473 2.266729 [95,] 3.544698 2.882386 [96,] 4.147667 2.748773 [97,] 4.098212 2.762276 [98,] 5.786155 2.165470 [99,] 1.204406 2.147812[100,] 3.726652 2.550643[101,] 12.868885 6.006482[102,] 8.275202 4.992602[103,] 13.818242 4.538085[104,] 11.395916 3.733291[105,] 12.611549 4.518063[106,] 15.167288 4.580621[107,] 3.256931 4.343911[108,] 14.657778 4.221338[109,] 12.606503 3.429156[110,] 14.293539 5.431041[111,] 10.913678 4.848214[112,] 10.620091 3.985814[113,] 12.455521 4.586778[114,] 8.165304 5.279075[115,] 8.537106 5.643321[116,] 11.603757 5.419617[117,] 11.522072 3.952109[118,] 15.250025 5.271586[119,] 15.472077 4.396878[120,] 8.917063 4.108094[121,] 13.199716 5.045357[122,] 7.799535 5.271964[123,] 15.305558 4.422065[124,] 8.684753 3.689291[125,] 12.898437 4.948861[126,] 14.177381 4.339674[127,] 8.090627 3.742130[128,] 7.812402 4.082107[129,] 11.927399 4.059812[130,] 14.012217 3.959194[131,] 14.560938 4.206181[132,] 15.236393 5.268181[133,] 12.010499 4.206740[134,] 8.927838 3.385948[135,] 10.179020 3.447383[136,] 14.868648 4.698355[137,] 12.240744 5.918564[138,] 11.418754 3.982833[139,] 7.492590 4.141120[140,] 12.415297 4.801756[141,] 12.650368 5.203968[142,] 11.761565 5.179988[143,] 13.361707 5.127742[144,] 12.955958 5.557022[145,] 11.713405 5.048448[146,] 9.150078 4.010551[147,] 11.016824 4.559839[148,] 11.759284 5.917160[149,]   8.022603  4.638540$costs [1] -6.569291e-05 -1.184407e-04 6.668903e-05 -4.284391e-04 1.331527e-05 -1.659447e-04 [7] 4.178618e-04 4.000721e-04 -1.471709e-04 -4.866961e-04 1.583424e-04 3.891458e-04 [13] -2.748767e-04 1.038017e-05 -1.223165e-04 -2.175199e-04 -3.052060e-04 -1.153663e-04 [19] -5.807774e-06 -1.318655e-05 1.086358e-04 -1.056506e-04 1.042230e-03 1.020565e-03 [25] 8.145448e-05 8.433778e-05 4.999846e-05 8.485467e-05 1.840803e-04 -3.276016e-04 [31] -6.551815e-04 1.557659e-04 -4.736972e-06 -4.735442e-05 -2.520546e-04 3.108438e-04 [37] 1.260465e-04 1.972264e-04 -9.611292e-05 2.441147e-04 -1.997326e-04 -7.527818e-05 [43] 1.671165e-04 5.584101e-04 4.285945e-04 -2.968739e-04 -1.330250e-05 -1.149360e-04 [49] 8.084874e-05 6.710403e-04 6.919283e-04 9.766942e-04 1.960186e-03 -1.673100e-04 [55] 1.498811e-03 1.518804e-03 1.191833e-03 3.496391e-04 8.389463e-04 7.130234e-04 [61] -5.238504e-05 1.329515e-03 1.051728e-03 2.541084e-03 -1.976349e-04 5.621641e-04 [67] 1.889274e-03 1.434286e-05 4.027971e-03 -1.166053e-04 1.719625e-03 1.638894e-03 [73] 2.473756e-03 1.286735e-03 1.833584e-03 8.951562e-04 6.807228e-04 4.984784e-03 [79] 2.823550e-03 1.089342e-04 -6.601256e-06 6.403744e-05 1.991636e-04 1.613454e-03 [85] 1.260980e-03 9.531492e-04 1.349265e-03 2.888901e-03 -1.345512e-04 2.987155e-06 [91] 5.545848e-04 2.071556e-03 3.427912e-04 3.104291e-04 5.055473e-04 2.853883e-05 [97] 5.868188e-04 2.521142e-03 5.766867e-04 3.513576e-04 3.311204e-04 1.479315e-03[103] 1.033390e-03 1.737451e-03 5.759593e-04 2.587490e-04 1.289787e-03 4.355705e-04[109] 1.261498e-03 4.912577e-04 3.050430e-03 3.313013e-03 9.174032e-04 1.021039e-03[115] 1.433732e-03 1.278868e-03 1.815971e-03 2.059448e-04 5.057746e-05 1.561998e-03[121] 5.342262e-04 1.744473e-03 2.214039e-04 2.083725e-03 3.547608e-04 5.196140e-04[127] 1.990998e-03 2.346313e-03 1.098786e-03 8.136133e-04 5.001043e-04 1.644533e-04[133] 6.233415e-04 2.139292e-03 8.210858e-04 -8.386480e-05 7.858205e-04 1.427453e-03[139] 2.148709e-03 6.094865e-04 1.929748e-04 -8.357979e-05 6.223272e-04 3.127318e-04[145]  2.927624e-04  1.391081e-03  3.127062e-03  1.176773e-03  1.645225e-03$itercosts [1] 43.7514985 44.7873147 44.8116650 44.3887944 45.7282669 0.3704256 0.1252816 0.1237133 [9] 0.1217102 0.1200852 0.1187576 0.1161445 0.1173155 0.1144428 0.1127897 0.1122483[17]  0.1129056  0.1116092  0.1111795  0.1105687$origD[1] 4$perplexity[1] 30$theta[1] 0.5$max_iter[1] 1000$stop_lying_iter[1] 250$mom_switch_iter[1] 250$momentum[1] 0.5$final_momentum[1] 0.8$eta[1] 200$exaggeration_factor[1] 12> data<-data.frame(tsne_out$Y,iris_unique$Species)> data X1 X2 iris_unique.Species1 -15.794362 -6.776711 setosa2 -18.120432 -6.231470 setosa3 -18.261085 -7.311696 setosa4 -18.520943 -7.087130 setosa5 -15.778549 -7.221474 setosa6 -14.008673 -7.269801 setosa7 -17.893455 -7.813756 setosa8 -16.467086 -6.810327 setosa9 -19.221721 -6.978521 setosa10 -17.803392 -6.466310 setosa11 -14.445038 -6.594416 setosa12 -17.100410 -7.344027 setosa13 -18.414497 -6.464928 setosa14 -19.408668 -7.464577 setosa15 -13.275994 -6.701498 setosa16 -13.149006 -7.122149 setosa17 -13.890901 -6.929687 setosa18 -15.771478 -6.789918 setosa19 -13.700649 -6.432417 setosa20 -14.852568 -7.405885 setosa21 -15.093413 -5.868684 setosa22 -15.141230 -7.327606 setosa23 -17.943436 -8.557303 setosa24 -16.259062 -5.898315 setosa25 -16.997972 -7.907640 setosa26 -17.825508 -5.984987 setosa27 -16.383265 -6.823896 setosa28 -15.438452 -6.609476 setosa29 -15.772932 -6.342601 setosa30 -17.844418 -7.233536 setosa31 -17.907978 -6.804983 setosa32 -15.149819 -5.939007 setosa33 -13.898201 -7.644515 setosa34 -13.421065 -7.250096 setosa35 -17.747953 -6.510444 setosa36 -17.257790 -6.137830 setosa37 -14.564020 -6.003143 setosa38 -16.193628 -7.512172 setosa39 -19.165844 -7.239552 setosa40 -16.147220 -6.566834 setosa41 -16.104934 -7.127514 setosa42 -19.608322 -6.371928 setosa43 -18.891659 -7.666698 setosa44 -15.795457 -7.824185 setosa45 -14.644010 -8.070843 setosa46 -18.380163 -6.474333 setosa47 -14.788982 -7.480182 setosa48 -18.361689 -7.377014 setosa49 -14.677585 -6.777023 setosa50 -16.850026 -6.588789 setosa51 8.536314 1.331853 versicolor52 7.160654 2.222958 versicolor53 8.845708 1.642313 versicolor54 2.600714 2.678237 versicolor55 7.608379 2.089299 versicolor56 4.581609 3.175618 versicolor57 7.227995 2.925564 versicolor58 1.259600 2.278412 versicolor59 7.658976 1.699396 versicolor60 2.519998 3.124515 versicolor61 1.356245 2.444995 versicolor62 4.855057 2.662532 versicolor63 3.194511 1.282723 versicolor64 6.573616 2.933744 versicolor65 2.654406 1.830627 versicolor66 7.646466 1.447916 versicolor67 4.740423 3.566360 versicolor68 3.545194 2.042330 versicolor69 5.763106 1.039076 versicolor70 2.780636 2.251854 versicolor71 7.123596 4.273298 versicolor72 4.397361 1.856958 versicolor73 8.586749 3.297060 versicolor74 6.268760 2.630621 versicolor75 6.569283 1.767931 versicolor76 7.376409 1.575341 versicolor77 8.456851 1.726707 versicolor78 9.372167 2.305879 versicolor79 5.871143 2.931851 versicolor80 2.242621 1.746196 versicolor81 2.373290 2.304550 versicolor82 2.174285 2.161862 versicolor83 3.296379 2.057194 versicolor84 8.559505 4.180703 versicolor85 4.351523 3.790885 versicolor86 6.254539 3.630555 versicolor87 8.222644 1.732763 versicolor88 5.693089 1.089601 versicolor89 3.861217 2.855112 versicolor90 2.883691 2.648004 versicolor91 3.519009 3.300865 versicolor92 6.352661 2.839296 versicolor93 3.341238 2.093332 versicolor94 1.310473 2.266729 versicolor95 3.544698 2.882386 versicolor96 4.147667 2.748773 versicolor97 4.098212 2.762276 versicolor98 5.786155 2.165470 versicolor99 1.204406 2.147812 versicolor100 3.726652 2.550643 versicolor101 12.868885 6.006482 virginica102 8.275202 4.992602 virginica103 13.818242 4.538085 virginica104 11.395916 3.733291 virginica105 12.611549 4.518063 virginica106 15.167288 4.580621 virginica107 3.256931 4.343911 virginica108 14.657778 4.221338 virginica109 12.606503 3.429156 virginica110 14.293539 5.431041 virginica111 10.913678 4.848214 virginica112 10.620091 3.985814 virginica113 12.455521 4.586778 virginica114 8.165304 5.279075 virginica115 8.537106 5.643321 virginica116 11.603757 5.419617 virginica117 11.522072 3.952109 virginica118 15.250025 5.271586 virginica119 15.472077 4.396878 virginica120 8.917063 4.108094 virginica121 13.199716 5.045357 virginica122 7.799535 5.271964 virginica123 15.305558 4.422065 virginica124 8.684753 3.689291 virginica125 12.898437 4.948861 virginica126 14.177381 4.339674 virginica127 8.090627 3.742130 virginica128 7.812402 4.082107 virginica129 11.927399 4.059812 virginica130 14.012217 3.959194 virginica131 14.560938 4.206181 virginica132 15.236393 5.268181 virginica133 12.010499 4.206740 virginica134 8.927838 3.385948 virginica135 10.179020 3.447383 virginica136 14.868648 4.698355 virginica137 12.240744 5.918564 virginica138 11.418754 3.982833 virginica139 7.492590 4.141120 virginica140 12.415297 4.801756 virginica141 12.650368 5.203968 virginica142 11.761565 5.179988 virginica143 13.361707 5.127742 virginica144 12.955958 5.557022 virginica145 11.713405 5.048448 virginica146 9.150078 4.010551 virginica147 11.016824 4.559839 virginica148 11.759284 5.917160 virginica149 8.022603 4.638540 virginica> colnames(data)<-c("Y1","Y2","Species")> data Y1 Y2 Species1 -15.794362 -6.776711 setosa2 -18.120432 -6.231470 setosa3 -18.261085 -7.311696 setosa4 -18.520943 -7.087130 setosa5 -15.778549 -7.221474 setosa6 -14.008673 -7.269801 setosa7 -17.893455 -7.813756 setosa8 -16.467086 -6.810327 setosa9 -19.221721 -6.978521 setosa10 -17.803392 -6.466310 setosa11 -14.445038 -6.594416 setosa12 -17.100410 -7.344027 setosa13 -18.414497 -6.464928 setosa14 -19.408668 -7.464577 setosa15 -13.275994 -6.701498 setosa16 -13.149006 -7.122149 setosa17 -13.890901 -6.929687 setosa18 -15.771478 -6.789918 setosa19 -13.700649 -6.432417 setosa20 -14.852568 -7.405885 setosa21 -15.093413 -5.868684 setosa22 -15.141230 -7.327606 setosa23 -17.943436 -8.557303 setosa24 -16.259062 -5.898315 setosa25 -16.997972 -7.907640 setosa26 -17.825508 -5.984987 setosa27 -16.383265 -6.823896 setosa28 -15.438452 -6.609476 setosa29 -15.772932 -6.342601 setosa30 -17.844418 -7.233536 setosa31 -17.907978 -6.804983 setosa32 -15.149819 -5.939007 setosa33 -13.898201 -7.644515 setosa34 -13.421065 -7.250096 setosa35 -17.747953 -6.510444 setosa36 -17.257790 -6.137830 setosa37 -14.564020 -6.003143 setosa38 -16.193628 -7.512172 setosa39 -19.165844 -7.239552 setosa40 -16.147220 -6.566834 setosa41 -16.104934 -7.127514 setosa42 -19.608322 -6.371928 setosa43 -18.891659 -7.666698 setosa44 -15.795457 -7.824185 setosa45 -14.644010 -8.070843 setosa46 -18.380163 -6.474333 setosa47 -14.788982 -7.480182 setosa48 -18.361689 -7.377014 setosa49 -14.677585 -6.777023 setosa50 -16.850026 -6.588789 setosa51 8.536314 1.331853 versicolor52 7.160654 2.222958 versicolor53 8.845708 1.642313 versicolor54 2.600714 2.678237 versicolor55 7.608379 2.089299 versicolor56 4.581609 3.175618 versicolor57 7.227995 2.925564 versicolor58 1.259600 2.278412 versicolor59 7.658976 1.699396 versicolor60 2.519998 3.124515 versicolor61 1.356245 2.444995 versicolor62 4.855057 2.662532 versicolor63 3.194511 1.282723 versicolor64 6.573616 2.933744 versicolor65 2.654406 1.830627 versicolor66 7.646466 1.447916 versicolor67 4.740423 3.566360 versicolor68 3.545194 2.042330 versicolor69 5.763106 1.039076 versicolor70 2.780636 2.251854 versicolor71 7.123596 4.273298 versicolor72 4.397361 1.856958 versicolor73 8.586749 3.297060 versicolor74 6.268760 2.630621 versicolor75 6.569283 1.767931 versicolor76 7.376409 1.575341 versicolor77 8.456851 1.726707 versicolor78 9.372167 2.305879 versicolor79 5.871143 2.931851 versicolor80 2.242621 1.746196 versicolor81 2.373290 2.304550 versicolor82 2.174285 2.161862 versicolor83 3.296379 2.057194 versicolor84 8.559505 4.180703 versicolor85 4.351523 3.790885 versicolor86 6.254539 3.630555 versicolor87 8.222644 1.732763 versicolor88 5.693089 1.089601 versicolor89 3.861217 2.855112 versicolor90 2.883691 2.648004 versicolor91 3.519009 3.300865 versicolor92 6.352661 2.839296 versicolor93 3.341238 2.093332 versicolor94 1.310473 2.266729 versicolor95 3.544698 2.882386 versicolor96 4.147667 2.748773 versicolor97 4.098212 2.762276 versicolor98 5.786155 2.165470 versicolor99 1.204406 2.147812 versicolor100 3.726652 2.550643 versicolor101 12.868885 6.006482 virginica102 8.275202 4.992602 virginica103 13.818242 4.538085 virginica104 11.395916 3.733291 virginica105 12.611549 4.518063 virginica106 15.167288 4.580621 virginica107 3.256931 4.343911 virginica108 14.657778 4.221338 virginica109 12.606503 3.429156 virginica110 14.293539 5.431041 virginica111 10.913678 4.848214 virginica112 10.620091 3.985814 virginica113 12.455521 4.586778 virginica114 8.165304 5.279075 virginica115 8.537106 5.643321 virginica116 11.603757 5.419617 virginica117 11.522072 3.952109 virginica118 15.250025 5.271586 virginica119 15.472077 4.396878 virginica120 8.917063 4.108094 virginica121 13.199716 5.045357 virginica122 7.799535 5.271964 virginica123 15.305558 4.422065 virginica124 8.684753 3.689291 virginica125 12.898437 4.948861 virginica126 14.177381 4.339674 virginica127 8.090627 3.742130 virginica128 7.812402 4.082107 virginica129 11.927399 4.059812 virginica130 14.012217 3.959194 virginica131 14.560938 4.206181 virginica132 15.236393 5.268181 virginica133 12.010499 4.206740 virginica134 8.927838 3.385948 virginica135 10.179020 3.447383 virginica136 14.868648 4.698355 virginica137 12.240744 5.918564 virginica138 11.418754 3.982833 virginica139 7.492590 4.141120 virginica140 12.415297 4.801756 virginica141 12.650368 5.203968 virginica142 11.761565 5.179988 virginica143 13.361707 5.127742 virginica144 12.955958 5.557022 virginica145 11.713405 5.048448 virginica146 9.150078 4.010551 virginica147 11.016824 4.559839 virginica148 11.759284 5.917160 virginica149 8.022603 4.638540 virginica



03

ggplot2绘图

>ggplot(data,aes(Y1,Y2,fill=Species))+geom_point(size=5.5,colour="black",alpha=0.6,shape=21)+scale_fill_manual(values=c("#00AFBB","#E7B800","blue"))

小结


Rtsne():给定输入对象之间的距离矩阵D(默认情况下是两个对象之间的欧氏距离),计算原始空间p_ij中的相似度评分,输入对象必须为矩阵!!

t-SNE的局限性:若原始数据本身具有很高的维度,是不可能完整映射到二或三维空间,而且在t-SNE图中,距离本身是没有意义的,涉及概率分布问题。

♫. ♪ ~ ♬..♩~ ♫. ♪..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩
♫. ♪ ~ ♬..♩~ ♫. ♪..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩
♫. ♪ ~ ♬..♩~ ♫. ♪..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩
♫. ♪ ~ ♬..♩~ ♫. ♪..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩..♩~ ♫. ♪ ~ ♬..♩


我知道你  在看  哦