1.1. 背景
对于开发来说,分页功能碰到的频率还是算蛮高的,基本上在每个模块中都需要都会遇到列表分页的功能。他们实现的都很快,因为基本上只要把之前的代码改改就OK了。他们的实现基本是是如下语句:
SELECT * FROM goods WHERE user_id = 4 LIMIT 1000, 20; ... omit ... 20 rows in set (0.11 sec)
像这样的语句对数据量小或偏移量小的时候是十分快的。但是当数据量大并且偏移量大的时候就会有问题了。如下:文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT * FROM goods WHERE user_id = 4 LIMIT 500000, 20; ... omit ... 20 rows in set (7.84 sec)
为什么会这样就不说了。下面给出优化的过程。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
1.2. 构建数据
-- 创建商品表 DROP TABLE IF EXISTS goods; CREATE TABLE goods( id bigint(20) unsigned NOT NULL AUTO_INCREMENT, good_name VARCHAR(50) NOT NULL, user_id TINYINT unsigned NOT NULL, PRIMARY KEY (`id`) ); -- 创建批量添加数据存储过程 -- 下面创建数据可能需要一点时间 DROP PROCEDURE IF EXISTS insert_batch; DELIMITER // CREATE PROCEDURE insert_batch() BEGIN DECLARE num INT; DECLARE user_id TINYINT; SET num=1; WHILE num <= 100000 DO SELECT FLOOR(RAND() * 10 + 1) INTO user_id; INSERT INTO goods VALUES(NULL, REPEAT('X', 50), user_id); SET num=num+1; END WHILE; SET num=1; WHILE num <= 7 DO INSERT INTO goods SELECT NULL, good_name, user_id FROM goods; SET num=num+1; END WHILE; END // DELIMITER ; -- 调用存储过程 CALL insert_batch(); -- 添加索引 ALTER TABLE goods ADD INDEX idx$goods$user_id(user_id); SELECT user_id, COUNT(*) FROM goods GROUP BY user_id; +---------+----------+ | user_id | COUNT(*) | +---------+----------+ | 1 | 10089 | | 2 | 10077 | | 3 | 9944 | | 4 | 12710074 | | 5 | 10011 | | 6 | 9925 | | 7 | 9950 | | 8 | 10149 | | 9 | 9949 | | 10 | 9832 | +---------+----------+
这边我们以数据最多的user_id=4的记录来模拟文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
1.3. 优化规则
让所有结果集数据最小化。如果是临时表,还是行数据还是列数据都让结果最小化,还有就是临时结果集尽量不走主键索引,走二级索引。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
1.4. 模拟
现在我们需要查询用户4在10000000后20条数据文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
1、通过user_id找到主键ID(让列结果最小化)文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT id FROM goods WHERE user_id = 4 LIMIT 10000000, 20; 10343427 ... omit ... 10343446 20 rows in set (1.83 sec)
2、通过获得的主键ID寻找需要的数据,这边我就不使用python来演示了。在程序里面就需要拼出IN里面的条件。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT * FROM goods WHERE id IN( 10343427, 10343428, 10343429, 10343430, 10343431, 10343432, 10343433, 10343434, 10343435, 10343436, 10343437, 10343438, 10343439, 10343440, 10343441, 10343442, 10343443, 10343444, 10343445, 10343446 ); +----------+----------------------------------------------------+---------+ | id | good_name | user_id | +----------+----------------------------------------------------+---------+ | 10343427 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 4 | ... omit ... 20 rows in set (0.01 sec)
1.5. 进一步优化
其实上面我们还能让结构级变少。来看下面列表简图:文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
在网页的分页按钮基本省都是一个连接,或者通过jquery时间分页。我们可以在按钮上添加两个属性参数为max_id和min_id。分别记录的是当前页数据的最小ID和最大ID(如:min_max=10343427、max_id=10343446)。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
查找数据如下:文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
1、通过user_id找到主键ID(让列结果最小化)文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
如果是点击下一页文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT id FROM goods WHERE id > 10343446 AND user_id = 4 LIMIT 0, 20; +----------+ | id | +----------+ | 10343447 | ... omit ... 20 rows in set (0.02 sec)
如果是点击上一页(上一页会比下一页性能来的差一点,因为有用到排序)文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT id FROM goods WHERE id < 10343427 AND user_id = 4 ORDER BY id DESC LIMIT 0, 20;
2、通过获得的主键ID寻找需要的数据文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
SELECT * FROM goods WHERE id IN( 10343447, 10343448, 10343449, 10343450, 10343451, 10343452, 10343453, 10343454, 10343455, 10343456, 10343457, 10343458, 10343459, 10343460, 10343461, 10343462, 10343463, 10343464, 10343465, 10343466 ); +----------+----------------------------------------------------+---------+ | id | good_name | user_id | +----------+----------------------------------------------------+---------+ | 10343447 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | 4 | ... omit ... 20 rows in set (0.01 sec)
1.6. 总结
这种优化可能在一些使用到聚合函数的排序的情况下没法使用。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
在这边鼓励使用MySQL的尽量使用比较简单的语句,不使用JOIN。因为优化器对简单的语句解析的很快,而且在维护的角度来说越白痴的语句越让人容易明白。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
当然,强烈反对在程序中 for 循环取数据库。文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
昵称:HH
QQ:275258836
ttlsa群交流沟通(QQ群②:6690706 QQ群③:168085569 QQ群④:415230207(新) 微信公众号:ttlsacom)文章源自运维生存时间-https://www.ttlsa.com/mysql/millions_of_millions_data_paging/
感觉本文内容不错,读后有收获?

评论