MySQL分组统计你会吗?
分组统计在日常工作中经常用到,今天分享使用sum,case when进行分组统计。
表结构如下:
CREATE TABLE `staff` (`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'ID',`name` varchar(50) NOT NULL COMMENT '姓名',`dept_id` varchar(10) DEFAULT NULL COMMENT '部门id',`salary` double DEFAULT NULL COMMENT '工资',`sex` int(11) DEFAULT NULL COMMENT '性别,1是男生,0是女生',`created_time` datetime DEFAULT NULL COMMENT '创建时间',`updated_time` datetime DEFAULT NULL COMMENT '更新时间',PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8 COMMENT='职员表'
现有记录:
mysql> select * from staff ;+----+-------+---------+--------+------+---------------------+---------------------+| id | name | dept_id | salary | sex | created_time | updated_time |+----+-------+---------+--------+------+---------------------+---------------------+| 1 | Tom | 101 | 2500 | 1 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 2 | Tonny | 101 | 3500 | 1 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 3 | Bob | 101 | 3500 | 0 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 4 | Rob | 101 | 4500 | NULL | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 5 | Sudey | 101 | 5500 | 0 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 6 | Sunny | 102 | 5500 | 1 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 7 | Sedey | 102 | 3500 | 0 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 8 | Bobby | 102 | 4500 | NULL | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 9 | Weedy | 103 | 4500 | 1 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 || 10 | Jimme | 104 | 5500 | 0 | 2020-03-08 18:37:55 | 2020-03-08 18:37:55 |+----+-------+---------+--------+------+---------------------+---------------------+10 rows in set (0.00 sec)
1.查询每个部门男生、女生分别多少人?
使用sum能正确查出,如下语句1符合期望:
selectdept_id,count(*) as total_num,sum(casewhen sex = 1 then 1 else 0 end) as female_num,sum(casewhen sex = 0 then 1 else 0 end) as male_num,sum(casewhen (sex<>1 and sex<>0) or sex is null then 1 else 0 end) as unknown_numfrom staffgroup by dept_id;+---------+-----------+------------+----------+-------------+| dept_id | total_num | female_num | male_num | unknown_num |+---------+-----------+------------+----------+-------------+| 101 | 5 | 2 | 2 | 1 || 102 | 3 | 1 | 1 | 1 || 103 | 1 | 1 | 0 | 0 || 104 | 1 | 0 | 1 | 0 |+---------+-----------+------------+----------+-------------+4 rows in set (0.00 sec)
有些人会问使用count(case when......)可以吗?如下语句2不符合期望:
selectdept_id,count(*) as total_num,count(casewhen sex = 1 then 1 else 0 end) as female_num,count(casewhen sex = 0 then 1 else 0 end) as male_num,count(casewhen (sex<>1 and sex<>0) or sex is null then 1 else 0 end) as unknown_numfrom staffgroup by dept_id;+---------+-----------+------------+----------+-------------+| dept_id | total_num | female_num | male_num | unknown_num |+---------+-----------+------------+----------+-------------+| 101 | 5 | 5 | 5 | 5 || 102 | 3 | 3 | 3 | 3 || 103 | 1 | 1 | 1 | 1 || 104 | 1 | 1 | 1 | 1 |+---------+-----------+------------+----------+-------------+4 rows in set (0.00 sec)
可见如上语句执行不报错,但是结果不对,这就是sum和count的区别。sum(c1)意思是对c1字段的所有值求和,可能是小数、整数、负数等等。count(c1)的意思是返回该表的总记录数,不是字段值求和,肯定是0或正整数。
2.查询 101部门salary>3000,102部门salary>4000, 其他部门salary>5000的男生、女生分别多少人?
可见,部门不同,条件也不同,怎么写呢?where条件中使用case when ,如下语句3符合期望:
selectdept_id,count(*) as total_num,sum(casewhen sex = 1 then 1 else 0 end) as female_num,sum(casewhen sex = 0 then 1 else 0 end) as male_num,sum(casewhen (sex<>1 and sex<>0) or sex is null then 1 else 0 end) as unknown_numfrom staffwhere salary >(casewhen dept_id = '101' then 3000when dept_id = '102' then 4000else 5000end)group by dept_id;+---------+-----------+------------+----------+-------------+| dept_id | total_num | female_num | male_num | unknown_num |+---------+-----------+------------+----------+-------------+| 101 | 4 | 1 | 2 | 1 || 102 | 2 | 1 | 0 | 1 || 104 | 1 | 0 | 1 | 0 |+---------+-----------+------------+----------+-------------+3 rows in set (0.00 sec)
where条件中使用case when ,使用另一种语法,如下语句4符合期望:
selectdept_id,count(*) as total_num,sum(casewhen sex = 1 then 1 else 0 end) as female_num,sum(casewhen sex = 0 then 1 else 0 end) as male_num,sum(casewhen (sex<>1 and sex<>0) or sex is null then 1 else 0 end) as unknown_numfrom staffwhere -- 换一种case when的语法(casewhen dept_id = '101' then salary > 3000when dept_id = '102' then salary > 4000else salary > 5000end)group by dept_id;
使用if,where条件中使用case when ,如下语句5符合期望:
selectdept_id,count(*) as total_num,sum(if(sex=1,1,0)) as female_num,sum(if(sex=0,1,0)) as male_num,sum(if((sex<>1 and sex<>0) or sex is null,1,0)) as unknown_numfrom staffwhere ......
细心的朋友会发现,条件中为啥还要带有"or sex is null"呢?
MySQL在执行where c1<>v1时会把c1为NUL的记录过滤掉!
mysql> select id,name,sex from staff where sex <> 1 ;-- 没有sex为NULL的记录+----+-------+------+| id | name | sex |+----+-------+------+| 3 | Bob | 0 || 5 | Sudey | 0 || 7 | Sedey | 0 || 10 | Jimme | 0 |+----+-------+------+4 rows in set (0.00 sec)mysql> select id,name,sex from staff where sex <> 1 or sex is null ;-- 带上 or sex is null 后才正确+----+-------+------+| id | name | sex |+----+-------+------+| 3 | Bob | 0 || 4 | Rob | NULL || 5 | Sudey | 0 || 7 | Sedey | 0 || 8 | Bobby | NULL || 10 | Jimme | 0 |+----+-------+------+6 rows in set (0.00 sec)
如果需要使用多重case when 怎么办呢 ?如下:
selectsalary,casewhen salary >5000 then -- 这里的then别忘了写casewhen salary <7000 and salary >=6000 then 'T6'when salary <6000 and salary >=5000 then 'T5'endwhen salary >3000 thencasewhen salary <5000 and salary >=4000 then 'T4'when salary <4000 and salary >=3000 then 'T3'endwhen salary >1000 thencasewhen salary <3000 and salary >=2000 then 'T2'when salary <2000 and salary >=1000 then 'T1'endelse 'T0'end as salary_levelfrom staff ;+--------+--------------+| salary | salary_level |+--------+--------------+| 2500 | T2 || 3500 | T3 || 3500 | T3 || 4500 | T4 || 5500 | T5 || 5500 | T5 || 3500 | T3 || 4500 | T4 || 4500 | T4 || 5500 | T5 |+--------+--------------+10 rows in set (0.00 sec
历史文章:
「欢迎关注,一起学习,一起进步」
