vlambda博客
学习文章列表

R语言笔记——将多个数字变量转换为因数




      本个专题将介绍R语言的一些基本技能和实用技巧。R中,可以使用lapply函数将多个数值变量转换为因数。lapply函数是apply系列函数的一部分。它们在R中执行多个迭代(循环)。在R中,需要将类别变量设置为因子变量。本质上是分类的某些数字变量需要转换为因式,以便R将它们视为分组变量。






将数值变量转换为因数

     使用列索引号


在这种情况下,我们将第一,第二,第三和第五个数字变量转换为因子变量。mydata是一个数据框。


> names <- c(1:3,5)> mydata[,names] <- lapply(mydata[,names] , factor)> str(mydata)


'data.frame': 51 obs. of 16 variables: $ Index: Factor w/ 19 levels "A","C","D","F",..: 1 1 1 1 2 2 2 3 3 4 ... $ State: Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ... $ Y2002: Factor w/ 51 levels "1111437","1134317",..: 9 5 36 18 34 12 28 11 1 49 ... $ Y2003: int 1317711 1960378 1968140 1994927 1675807 1878473 1232844 1268673 1993741 1468852 ... $ Y2004: Factor w/ 51 levels "1118631","1119299",..: 1 40 18 2 45 44 6 34 17 19 ... $ Y2005: int 1492583 1447852 1782199 1947979 1480280 1236697 1518933 1403759 1827949 1362787 ... $ Y2006: int 1107408 1861639 1102568 1669191 1735069 1871471 1841266 1441351 1803852 1339608 ... $ Y2007: int 1440134 1465841 1109382 1801213 1812546 1814218 1976976 1300836 1595981 1278550 ... $ Y2008: int 1945229 1551826 1752886 1188104 1487315 1875146 1764457 1762096 1193245 1756185 ... $ Y2009: int 1944173 1436541 1554330 1628980 1663809 1752387 1972730 1553585 1739748 1818438 ... $ Y2010: int 1237582 1629616 1300521 1669295 1624509 1913275 1968730 1370984 1707823 1198403 ... $ Y2011: int 1440756 1230866 1130709 1928238 1639670 1665877 1945524 1318669 1353449 1497051 ... $ Y2012: int 1186741 1512804 1907284 1216675 1921845 1491604 1228529 1984027 1979708 1131928 ... $ Y2013: int 1852841 1985302 1363279 1591896 1156536 1178355 1582249 1671279 1912654 1107448 ... $ Y2014: int 1558906 1580394 1525866 1360959 1388461 1383978 1503156 1803169 1782169 1407784 ... $ Y2015: int  1916661 1979143 1647724 1329341 1644607 1330736 1718072 1627508 1410183 1170389 ...



 使用列名

  在这种情况下,我们将两个变量“Credit”“ Balance”转换为因子变量。



> names <- c('Credit' ,'Balance')> mydata[,names] <- lapply(mydata[,names] , factor)> str(mydata)


'data.frame': 51 obs. of 16 variables: $ Index: Factor w/ 19 levels "A","C","D","F",..: 1 1 1 1 2 2 2 3 3 4 ... $ State: Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ... $ Y2002: Factor w/ 51 levels "1111437","1134317",..: 9 5 36 18 34 12 28 11 1 49 ... $ Y2003: int 1317711 1960378 1968140 1994927 1675807 1878473 1232844 1268673 1993741 1468852 ... $ Y2004: Factor w/ 51 levels "1118631","1119299",..: 1 40 18 2 45 44 6 34 17 19 ... $ Y2005: int 1492583 1447852 1782199 1947979 1480280 1236697 1518933 1403759 1827949 1362787 ... $ Y2006: int 1107408 1861639 1102568 1669191 1735069 1871471 1841266 1441351 1803852 1339608 ... $ Y2007: int 1440134 1465841 1109382 1801213 1812546 1814218 1976976 1300836 1595981 1278550 ... $ Y2008: int 1945229 1551826 1752886 1188104 1487315 1875146 1764457 1762096 1193245 1756185 ... $ Y2009: int 1944173 1436541 1554330 1628980 1663809 1752387 1972730 1553585 1739748 1818438 ... $ Y2010: int 1237582 1629616 1300521 1669295 1624509 1913275 1968730 1370984 1707823 1198403 ... $ Y2011: int 1440756 1230866 1130709 1928238 1639670 1665877 1945524 1318669 1353449 1497051 ... $ Y2012: int 1186741 1512804 1907284 1216675 1921845 1491604 1228529 1984027 1979708 1131928 ... $ Y2013: int 1852841 1985302 1363279 1591896 1156536 1178355 1582249 1671279 1912654 1107448 ... $ Y2014: int 1558906 1580394 1525866 1360959 1388461 1383978 1503156 1803169 1782169 1407784 ... $ Y2015: int  1916661 1979143 1647724 1329341 1644607 1330736 1718072 1627508 1410183 1170389 ...




     转换所有变量



> col_names <- names(mydata)> mydata[,col_names] <- lapply(mydata[,col_names] , factor)



 

      转换所有数值变量



> mydata[sapply(mydata, is.numeric)] <- lapply(mydata[sapply(mydata, is.numeric)], as.factor)




   检查变量中的唯一值并仅将唯一计数小于4的那些变量转换为因数



> col_names <- sapply(mydata, function(col) length(unique(col)) < 4)mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)