R语言笔记——将多个数字变量转换为因数
本个专题将介绍R语言的一些基本技能和实用技巧。在R中,可以使用lapply函数将多个数值变量转换为因数。lapply函数是apply系列函数的一部分。它们在R中执行多个迭代(循环)。在R中,需要将类别变量设置为因子变量。本质上是分类的某些数字变量需要转换为因式,以便R将它们视为分组变量。
将数值变量转换为因数
使用列索引号
在这种情况下,我们将第一,第二,第三和第五个数字变量转换为因子变量。mydata是一个数据框。
names <- c(1:3,5)
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
'data.frame': 51 obs. of 16 variables:
$ Index: Factor w/ 19 levels "A","C","D","F",..: 1 1 1 1 2 2 2 3 3 4 ...
$ State: Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Y2002: Factor w/ 51 levels "1111437","1134317",..: 9 5 36 18 34 12 28 11 1 49 ...
$ Y2003: int 1317711 1960378 1968140 1994927 1675807 1878473 1232844 1268673 1993741 1468852 ...
$ Y2004: Factor w/ 51 levels "1118631","1119299",..: 1 40 18 2 45 44 6 34 17 19 ...
$ Y2005: int 1492583 1447852 1782199 1947979 1480280 1236697 1518933 1403759 1827949 1362787 ...
$ Y2006: int 1107408 1861639 1102568 1669191 1735069 1871471 1841266 1441351 1803852 1339608 ...
$ Y2007: int 1440134 1465841 1109382 1801213 1812546 1814218 1976976 1300836 1595981 1278550 ...
$ Y2008: int 1945229 1551826 1752886 1188104 1487315 1875146 1764457 1762096 1193245 1756185 ...
$ Y2009: int 1944173 1436541 1554330 1628980 1663809 1752387 1972730 1553585 1739748 1818438 ...
$ Y2010: int 1237582 1629616 1300521 1669295 1624509 1913275 1968730 1370984 1707823 1198403 ...
$ Y2011: int 1440756 1230866 1130709 1928238 1639670 1665877 1945524 1318669 1353449 1497051 ...
$ Y2012: int 1186741 1512804 1907284 1216675 1921845 1491604 1228529 1984027 1979708 1131928 ...
$ Y2013: int 1852841 1985302 1363279 1591896 1156536 1178355 1582249 1671279 1912654 1107448 ...
$ Y2014: int 1558906 1580394 1525866 1360959 1388461 1383978 1503156 1803169 1782169 1407784 ...
$ Y2015: int 1916661 1979143 1647724 1329341 1644607 1330736 1718072 1627508 1410183 1170389 ...
使用列名
在这种情况下,我们将两个变量“Credit”和“ Balance”转换为因子变量。
'Credit' ,'Balance') names <- c(
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
'data.frame': 51 obs. of 16 variables:
$ Index: Factor w/ 19 levels "A","C","D","F",..: 1 1 1 1 2 2 2 3 3 4 ...
$ State: Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Y2002: Factor w/ 51 levels "1111437","1134317",..: 9 5 36 18 34 12 28 11 1 49 ...
$ Y2003: int 1317711 1960378 1968140 1994927 1675807 1878473 1232844 1268673 1993741 1468852 ...
$ Y2004: Factor w/ 51 levels "1118631","1119299",..: 1 40 18 2 45 44 6 34 17 19 ...
$ Y2005: int 1492583 1447852 1782199 1947979 1480280 1236697 1518933 1403759 1827949 1362787 ...
$ Y2006: int 1107408 1861639 1102568 1669191 1735069 1871471 1841266 1441351 1803852 1339608 ...
$ Y2007: int 1440134 1465841 1109382 1801213 1812546 1814218 1976976 1300836 1595981 1278550 ...
$ Y2008: int 1945229 1551826 1752886 1188104 1487315 1875146 1764457 1762096 1193245 1756185 ...
$ Y2009: int 1944173 1436541 1554330 1628980 1663809 1752387 1972730 1553585 1739748 1818438 ...
$ Y2010: int 1237582 1629616 1300521 1669295 1624509 1913275 1968730 1370984 1707823 1198403 ...
$ Y2011: int 1440756 1230866 1130709 1928238 1639670 1665877 1945524 1318669 1353449 1497051 ...
$ Y2012: int 1186741 1512804 1907284 1216675 1921845 1491604 1228529 1984027 1979708 1131928 ...
$ Y2013: int 1852841 1985302 1363279 1591896 1156536 1178355 1582249 1671279 1912654 1107448 ...
$ Y2014: int 1558906 1580394 1525866 1360959 1388461 1383978 1503156 1803169 1782169 1407784 ...
$ Y2015: int 1916661 1979143 1647724 1329341 1644607 1330736 1718072 1627508 1410183 1170389 ...
转换所有变量
col_names <- names(mydata)
mydata[,col_names] <- lapply(mydata[,col_names] , factor)
转换所有数值变量
> mydata[sapply(mydata, is.numeric)] <- lapply(mydata[sapply(mydata, is.numeric)], as.factor)
检查变量中的唯一值并仅将唯一计数小于4的那些变量转换为因数
> col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
> mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆
◆