R语言如何取分组的第一个值或最后一个值?
SQL中的FIRST_VALUE/LAST_VALUE函数可返回每个组的第一个值或最后一个值,R语言中可以通过aggregate()或group_by()函数来完成,这里会用到dplyr包,我们来看看实现过程:
0、演示数据:
library(dplyr)
df_test= data.frame(
Name=c('James','Paul','Richards','Marico','Samantha','Ravi','Raghu','Richards','George','Ema','Samantha','Catherine'),
State=c('Alaska','California','Texas','North Carolina','California','Texas','Alaska','Texas','North Carolina','Alaska','California','Texas'),
Sales=c(14,24,31,12,13,7,9,31,18,16,18,14)
)
df_test
# df_test输出结果如下:
Name State Sales
1 James Alaska 14
2 Paul California 24
3 Richards Texas 31
4 Marico North Carolina 12
5 Samantha California 13
6 Ravi Texas 7
7 Raghu Alaska 9
8 Richards Texas 31
9 George North Carolina 18
10 Ema Alaska 16
11 Samantha California 18
12 Catherine Texas 14
1、单列分组取第一个值/最后一个值
a. 分组第一个值
# aggregate方法
aggregate(
df_test$Sales,
by=list(df_test$State),
FUN=first
)
Group.1 x
1 Alaska 14
2 California 24
3 North Carolina 12
4 Texas 31
# group_by方法
df_test %>%
group_by(State) %>%
summarise(First_value_sales = first(Sales))
State First_value_sales
<chr> <dbl>
1 Alaska 14
2 California 24
3 North Carolina 12
4 Texas 31
b. 分组最后一个值
# aggregate方法
aggregate(
df_test$Sales,
by=list(df_test$State),
FUN=last
)
Group.1 x
1 Alaska 16
2 California 18
3 North Carolina 18
4 Texas 14
# group_by方法
df_test %>%
group_by(State) %>%
summarise(Last_value_sales = last(Sales))
State Last_value_sales
<chr> <dbl>
1 Alaska 16
2 California 18
3 North Carolina 18
4 Texas 14
2、多列分组取第一个值/最后一个值
a. 分组第一个值
# aggregate方法
aggregate(
df_test$Sales,
by=list(df_test$State, df_test$Name),
FUN=first
)
Group.1 Group.2 x
1 Texas Catherine 14
2 Alaska Ema 16
3 North Carolina George 18
4 Alaska James 14
5 North Carolina Marico 12
6 California Paul 24
7 Alaska Raghu 9
8 Texas Ravi 7
9 Texas Richards 31
10 California Samantha 13
# group_by方法
df_test %>%
group_by(State,Name) %>%
summarise(First_value_sales = first(Sales))
State Name First_value_sales
<chr> <chr> <dbl>
1 Alaska Ema 16
2 Alaska James 14
3 Alaska Raghu 9
4 California Paul 24
5 California Samantha 13
6 North Carolina George 18
7 North Carolina Marico 12
8 Texas Catherine 14
9 Texas Ravi 7
10 Texas Richards 31
b. 分组最后一个值
# aggregate方法
aggregate(
df_test$Sales,
by=list(df_test$State, df_test$Name),
FUN=last
)
Group.1 Group.2 x
1 Texas Catherine 14
2 Alaska Ema 16
3 North Carolina George 18
4 Alaska James 14
5 North Carolina Marico 12
6 California Paul 24
7 Alaska Raghu 9
8 Texas Ravi 7
9 Texas Richards 31
10 California Samantha 18
# group_by方法
df_test %>%
group_by(State, Name) %>%
summarise(Last_value_sales = last(Sales))
State Name Last_value_sales
<chr> <chr> <dbl>
1 Alaska Ema 16
2 Alaska James 14
3 Alaska Raghu 9
4 California Paul 24
5 California Samantha 18
6 North Carolina George 18
7 North Carolina Marico 12
8 Texas Catherine 14
9 Texas Ravi 7
10 Texas Richards 31