如何解决加快从庞大的日志表中收集查询的最佳方法是什么?
我有 MysqL 数据库日志表,每天增加 5m 的数据 我在从该表中收集数据以进行一些分析计数时遇到问题。
我已经列出了问题的详细信息如下:
这是我的日志表:
CREATE TABLE `details` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,`provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',`DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',`source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',`Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',`msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',`timestamp` TIMESTAMP NULL,`msg_timestamp` INT NOT NULL,`created_at` TIMESTAMP NULL,`updated_at` TIMESTAMP NULL,PRIMARY KEY (`id`,`msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
它包含的日志数据如下:
id | provider | DLR_Status | 来源 | 目的地 | msg | 时间戳 | msg_timestamp |
---|---|---|---|---|---|---|---|
1 | KDD | 完成 | 网站 | 01332456 | 免费送货 | 2019-12-01 12:00:13 | 1575201613 |
2 | KDD | 完成 | 通过电话 | 01322422 | 费用为 300 | 2019-12-01 12:00:37 | 1575201637 |
. | . | . | . | . | . | . | . |
. | . | . | . | . | . | . | . |
问题是当我从这个表中选择一些计数时
SELECT sql_CALC_FOUND_ROWS DLR_status,count(*) as c
FROM sms_details
group by DLR_status;
给出结果需要很长时间,并且一些查询给出 504 Gateway Time-out 错误,就像这个查询
SELECT sql_CALC_FOUND_ROWS Destination,count(*) as c
FROM sms_details
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;
我已经在我的表中使用分区 我尝试为某些列建立索引,但这对每天增加的数据造成了很大的问题。
那么以下方面的最佳做法是什么:
- 加快执行时间
- 关心插入速度
解决方法
收缩架构 -- 更小 --> 更少的 I/O --> 更快。
timestamp vs msg_timestamp -- 这些似乎是一样的,只是格式不同。其中,扔其中一个。
标准化 通过缩小数据量来加快插入速度。大部分 VARCHARs
可以由 2 字节 SMALLINT UNSIGNED 或 3 字节 MEDIUMINT UNSIGNED` 替换。
未来分区 -- 这样的分区不能超过一个; SELECTs
会浪费时间打开它们却什么也找不到。
分区过多 -- 在某个限制(可能是 50 个)下,有很多分区会减慢速度。
批处理是最好的加速方法。请参阅 LOAD DATA
或 INSERT ... VALUES (...),(...),...
。在后一种情况下,我建议批量处理 1000 行。 (超出这个范围会导致收益递减和可能存在一些限制。)如果数据来自多个来源,请解释;然后我们可以进一步讨论。
分区对于清除“旧”数据非常有用,因为 DROP PARTITION
比 DELETE
快得多。见http://mysql.rjweb.org/doc.php/partitionmaint
投掷 created_at
和 updated_at
;他们可能没用。 (同样,越小越快。)
SQL_CALC_FOUND_ROWS 当您没有 LIMIT
时不需要;只需观察返回的行数。重新考虑用户对此类的需求。 (如果需要,请回来进行更多讨论。)
INDEX(DLR_status)
,DLR_status counts 将是完整的索引扫描。并考虑将该列设为 ENUM,使其只有 1 个字节。 (如果有多个值和/或越来越多的值,则“标准化”。)
查询 2 需要 INDEX(Destination,msg_timestmap)
。
它很大吗? 24M 行/5 年 --> 每秒不到 1 行。 100 行/秒是我开始担心“高速摄取”的地方。也就是说,我认为插入没有问题。另一方面,选择可能是一个问题。你给我们看了两个;让我们看看更多。我不想一次推荐一个索引;我宁愿设计一组索引来优化处理所有可能的查询。 特别是因为它可能涉及重新设计分区。
汇总表是在“数据仓库”中进行快速分析的绝佳方式。见http://mysql.rjweb.org/doc.php/summarytables
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。