上周面试中被问到一个看似基础却暗藏玄机的问题:"WHERE和HAVING有什么区别?"当时虽然答出了表面区别,但追问到索引失效场景时却卡壳了。回来做了系统性复盘,发现这实际是SQL执行机制与索引优化的核心考点。
WHERE子句在数据检索阶段生效,作用于原始表数据。当执行SELECT * FROM users WHERE age > 18时,存储引擎会先过滤掉不符合条件的行,再将结果交给后续处理。这意味着:
而HAVING在结果集处理阶段生效,作用于GROUP BY后的分组数据。例如SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 10000中:
关键记忆点:WHERE是"过滤器",HAVING是"筛选器"——前者过滤原材料,后者筛选成品
错误示例1:在WHERE中使用聚合函数
sql复制-- 错误写法(执行报错)
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 10000
GROUP BY department
-- 正确写法
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 10000
错误示例2:对非分组字段使用HAVING
sql复制-- 低效写法(虽然语法正确)
SELECT product_id, SUM(quantity)
FROM order_items
GROUP BY product_id
HAVING product_id IN (1001, 1002)
-- 高效写法
SELECT product_id, SUM(quantity)
FROM order_items
WHERE product_id IN (1001, 1002)
GROUP BY product_id
面试官追问"WHERE和HAVING对索引的影响"时,我意识到自己缺乏系统性认知。实测发现以下场景会导致索引失效:
隐式类型转换:
sql复制-- user_id是varchar类型但用了数字查询(索引失效)
SELECT * FROM users WHERE user_id = 10086;
前导模糊查询:
sql复制-- 使用%开头的LIKE(无法走索引)
SELECT * FROM products WHERE name LIKE '%手机%';
函数操作字段:
sql复制-- 对索引字段使用函数(索引失效)
SELECT * FROM orders WHERE DATE_FORMAT(create_time,'%Y-%m')='2023-01';
OR条件不当:
sql复制-- 当OR两边字段不同且未全部索引时
SELECT * FROM logs WHERE id = 100 OR content LIKE '%error%';
不符合最左前缀:
sql复制-- 联合索引是(a,b,c),但查询跳过a
SELECT * FROM table WHERE b = 2 AND c = 3;
索引列运算:
sql复制-- 对索引字段进行数学运算
SELECT * FROM accounts WHERE balance + 100 > 500;
索引选择性 = 不重复索引值数量 / 表记录总数。高选择性字段更适合建索引:
通过EXPLAIN验证索引效果:
sql复制EXPLAIN SELECT * FROM users WHERE phone='13800138000';
-- 查看type列:const > ref > range > index > ALL
对于联合索引INDEX(a,b,c),有效查询组合包括:
失效场景:
实测案例:
sql复制-- 表结构
CREATE TABLE `order_details` (
`id` int NOT NULL,
`order_id` varchar(20) NOT NULL,
`product_id` int NOT NULL,
`quantity` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_product` (`order_id`,`product_id`)
) ENGINE=InnoDB;
-- 有效使用索引
SELECT * FROM order_details
WHERE order_id = 'ORD1001' AND product_id = 200;
-- 部分使用索引(仅用order_id)
SELECT * FROM order_details
WHERE order_id = 'ORD1001' ORDER BY quantity;
-- 索引完全失效
SELECT * FROM order_details WHERE product_id = 200;
MySQL 8.0+支持Index Skip Scan优化,特定场景下可以突破最左前缀限制:
sql复制-- 即使没有a条件也可能使用索引
SELECT * FROM table WHERE b = 2;
但需要满足:
sql复制EXPLAIN SELECT d.* FROM orders o
JOIN order_details d ON o.id = d.order_id
WHERE o.user_id = 100 AND d.quantity > 1;
重点关注:
当查询条件包含多个独立索引时,MySQL可能使用Index Merge:
sql复制-- 假设name和age都有独立索引
SELECT * FROM users WHERE name LIKE '张%' OR age > 30;
三种合并方式:
低效写法:
sql复制SELECT * FROM large_table
ORDER BY create_time DESC
LIMIT 1000000, 10;
-- 需要先排序1000010条记录
**优化方案1**:延迟关联
```sql
SELECT t.* FROM large_table t
JOIN (
SELECT id FROM large_table
ORDER BY create_time DESC
LIMIT 1000000, 10
) AS tmp ON t.id = tmp.id;
优化方案2:记录位点(要求有序且连续)
sql复制-- 记住上一页最后一条记录的create_time
SELECT * FROM large_table
WHERE create_time < '2023-06-01 12:00:00'
ORDER BY create_time DESC
LIMIT 10;
场景:统计每日订单金额(百万级数据)
方案1:实时计算(高延迟)
sql复制SELECT DATE(create_time), SUM(amount)
FROM orders
GROUP BY DATE(create_time);
方案2:预聚合+增量更新
daily_stats表结构:
code复制stat_date DATE PRIMARY KEY,
total_amount DECIMAL(12,2),
order_count INT
通过事件调度定期更新:
sql复制-- 每天凌晨更新
INSERT INTO daily_stats
SELECT
DATE(create_time),
SUM(amount),
COUNT(*)
FROM orders
WHERE create_time BETWEEN @yesterday AND @today
ON DUPLICATE KEY UPDATE
total_amount = VALUES(total_amount),
order_count = VALUES(order_count);
三星索引原则:
避免过度索引:
热点数据索引:
定期索引维护:
sql复制-- 重建索引(InnoDB)
ALTER TABLE orders ENGINE=InnoDB;
-- 分析索引使用情况
SELECT * FROM sys.schema_unused_indexes;
这次面试复盘让我深刻认识到,SQL优化不仅是记忆语法规则,更需要理解执行引擎的工作原理。建议每个开发者都要定期用EXPLAIN分析自己的查询,培养对执行计划的敏感度。