微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

加快从庞大的日志表中收集查询的最佳方法是什么?

如何解决加快从庞大的日志表中收集查询的最佳方法是什么?

我有 MysqL 数据库日志表,每天增加 5m 的数据 我在从该表中收集数据以进行一些分析计数时遇到问题。

我已经列出了问题的详细信息如下:

这是我的日志表:

CREATE TABLE `details` (
    `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,`provider` VARCHAR(25) NULL COLLATE 'utf8mb4_unicode_ci',`DLR_Status` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',`source` VARCHAR(30) NULL COLLATE 'utf8mb4_bin',`Destination` VARCHAR(30) NULL COLLATE 'utf8mb4_unicode_ci',`msg` VARCHAR(1000) NULL COLLATE 'utf8mb4_unicode_ci',`timestamp` TIMESTAMP NULL,`msg_timestamp` INT NOT NULL,`created_at` TIMESTAMP NULL,`updated_at` TIMESTAMP NULL,PRIMARY KEY (`id`,`msg_timestamp`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
AUTO_INCREMENT=24169513
/*!50100 PARTITION BY RANGE (`msg_timestamp`)
(PARTITION p2016_02 VALUES LESS THAN (1456779600) ENGINE = InnoDB,PARTITION p2016_03 VALUES LESS THAN (1459458000) ENGINE = InnoDB,PARTITION p2016_04 VALUES LESS THAN (1462050000) ENGINE = InnoDB,PARTITION p2016_05 VALUES LESS THAN (1464728400) ENGINE = InnoDB,PARTITION p2016_06 VALUES LESS THAN (1467320400) ENGINE = InnoDB,PARTITION p2016_07 VALUES LESS THAN (1469998800) ENGINE = InnoDB,PARTITION p2016_08 VALUES LESS THAN (1472677200) ENGINE = InnoDB,PARTITION p2016_09 VALUES LESS THAN (1475269200) ENGINE = InnoDB,PARTITION p2016_10 VALUES LESS THAN (1477947600) ENGINE = InnoDB,PARTITION p2016_11 VALUES LESS THAN (1480539600) ENGINE = InnoDB,PARTITION p2016_12 VALUES LESS THAN (1483218000) ENGINE = InnoDB,PARTITION p2017_01 VALUES LESS THAN (1485896400) ENGINE = InnoDB,PARTITION p2017_02 VALUES LESS THAN (1488315600) ENGINE = InnoDB,PARTITION p2017_03 VALUES LESS THAN (1490994000) ENGINE = InnoDB,PARTITION p2017_04 VALUES LESS THAN (1493586000) ENGINE = InnoDB,PARTITION p2017_05 VALUES LESS THAN (1496264400) ENGINE = InnoDB,PARTITION p2017_06 VALUES LESS THAN (1498856400) ENGINE = InnoDB,PARTITION p2017_07 VALUES LESS THAN (1501534800) ENGINE = InnoDB,PARTITION p2017_08 VALUES LESS THAN (1504213200) ENGINE = InnoDB,PARTITION p2017_09 VALUES LESS THAN (1506805200) ENGINE = InnoDB,PARTITION p2017_10 VALUES LESS THAN (1509483600) ENGINE = InnoDB,PARTITION p2017_11 VALUES LESS THAN (1512075600) ENGINE = InnoDB,PARTITION p2017_12 VALUES LESS THAN (1514754000) ENGINE = InnoDB,PARTITION p2018_01 VALUES LESS THAN (1517432400) ENGINE = InnoDB,PARTITION p2018_02 VALUES LESS THAN (1519851600) ENGINE = InnoDB,PARTITION p2018_03 VALUES LESS THAN (1522530000) ENGINE = InnoDB,PARTITION p2018_04 VALUES LESS THAN (1525122000) ENGINE = InnoDB,PARTITION p2018_05 VALUES LESS THAN (1527800400) ENGINE = InnoDB,PARTITION p2018_06 VALUES LESS THAN (1530392400) ENGINE = InnoDB,PARTITION p2018_07 VALUES LESS THAN (1533070800) ENGINE = InnoDB,PARTITION p2018_08 VALUES LESS THAN (1535749200) ENGINE = InnoDB,PARTITION p2018_09 VALUES LESS THAN (1538341200) ENGINE = InnoDB,PARTITION p2018_10 VALUES LESS THAN (1541019600) ENGINE = InnoDB,PARTITION p2018_11 VALUES LESS THAN (1543611600) ENGINE = InnoDB,PARTITION p2018_12 VALUES LESS THAN (1546290000) ENGINE = InnoDB,PARTITION p2019_01 VALUES LESS THAN (1548968400) ENGINE = InnoDB,PARTITION p2019_02 VALUES LESS THAN (1551387600) ENGINE = InnoDB,PARTITION p2019_03 VALUES LESS THAN (1554066000) ENGINE = InnoDB,PARTITION p2019_04 VALUES LESS THAN (1556658000) ENGINE = InnoDB,PARTITION p2019_05 VALUES LESS THAN (1559336400) ENGINE = InnoDB,PARTITION p2019_06 VALUES LESS THAN (1561928400) ENGINE = InnoDB,PARTITION p2019_07 VALUES LESS THAN (1564606800) ENGINE = InnoDB,PARTITION p2019_08 VALUES LESS THAN (1567285200) ENGINE = InnoDB,PARTITION p2019_09 VALUES LESS THAN (1569877200) ENGINE = InnoDB,PARTITION p2019_10 VALUES LESS THAN (1572555600) ENGINE = InnoDB,PARTITION p2019_11 VALUES LESS THAN (1575147600) ENGINE = InnoDB,PARTITION p2019_12 VALUES LESS THAN (1577826000) ENGINE = InnoDB,PARTITION p2020_01 VALUES LESS THAN (1580504400) ENGINE = InnoDB,PARTITION p2020_02 VALUES LESS THAN (1583010000) ENGINE = InnoDB,PARTITION p2020_03 VALUES LESS THAN (1585688400) ENGINE = InnoDB,PARTITION p2020_04 VALUES LESS THAN (1588280400) ENGINE = InnoDB,PARTITION p2020_05 VALUES LESS THAN (1590958800) ENGINE = InnoDB,PARTITION p2020_06 VALUES LESS THAN (1593550800) ENGINE = InnoDB,PARTITION p2020_07 VALUES LESS THAN (1596229200) ENGINE = InnoDB,PARTITION p2020_08 VALUES LESS THAN (1598907600) ENGINE = InnoDB,PARTITION p2020_09 VALUES LESS THAN (1601499600) ENGINE = InnoDB,PARTITION p2020_10 VALUES LESS THAN (1604178000) ENGINE = InnoDB,PARTITION p2020_11 VALUES LESS THAN (1606770000) ENGINE = InnoDB,PARTITION p2020_12 VALUES LESS THAN (1609448400) ENGINE = InnoDB,PARTITION p2021_01 VALUES LESS THAN (1612126800) ENGINE = InnoDB,PARTITION p2021_02 VALUES LESS THAN (1614546000) ENGINE = InnoDB,PARTITION p2021_03 VALUES LESS THAN (1617224400) ENGINE = InnoDB,PARTITION p2021_04 VALUES LESS THAN (1619816400) ENGINE = InnoDB,PARTITION p2021_05 VALUES LESS THAN (1622494800) ENGINE = InnoDB,PARTITION p2021_06 VALUES LESS THAN (1625086800) ENGINE = InnoDB,PARTITION p2021_07 VALUES LESS THAN (1627765200) ENGINE = InnoDB,PARTITION p2021_08 VALUES LESS THAN (1630443600) ENGINE = InnoDB,PARTITION p2021_09 VALUES LESS THAN (1633035600) ENGINE = InnoDB,PARTITION p2021_10 VALUES LESS THAN (1635714000) ENGINE = InnoDB,PARTITION p2021_11 VALUES LESS THAN (1638306000) ENGINE = InnoDB,PARTITION p2021_12 VALUES LESS THAN (1640984400) ENGINE = InnoDB,PARTITION p2022_01 VALUES LESS THAN (1643662800) ENGINE = InnoDB,PARTITION p2022_02 VALUES LESS THAN (1646082000) ENGINE = InnoDB,PARTITION p2022_03 VALUES LESS THAN (1648760400) ENGINE = InnoDB,PARTITION p2022_04 VALUES LESS THAN (1651352400) ENGINE = InnoDB,PARTITION p2022_05 VALUES LESS THAN (1654030800) ENGINE = InnoDB,PARTITION p2022_06 VALUES LESS THAN (1656622800) ENGINE = InnoDB,PARTITION p2022_07 VALUES LESS THAN (1659301200) ENGINE = InnoDB,PARTITION p2022_08 VALUES LESS THAN (1661979600) ENGINE = InnoDB,PARTITION p2022_09 VALUES LESS THAN (1664571600) ENGINE = InnoDB,PARTITION p2022_10 VALUES LESS THAN (1667250000) ENGINE = InnoDB,PARTITION p2022_11 VALUES LESS THAN (1669842000) ENGINE = InnoDB,PARTITION p2022_12 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)  */;

它包含的日志数据如下:

id provider DLR_Status 来源 目的地 msg 时间戳 msg_timestamp
1 KDD 完成 网站 01332456 免费送货 2019-12-01 12:00:13 1575201613
2 KDD 完成 通过电话 01322422 费用为 300 2019-12-01 12:00:37 1575201637
. . . . . . . .
. . . . . . . .

问题是当我从这个表中选择一些计数时

SELECT sql_CALC_FOUND_ROWS DLR_status,count(*) as c 
FROM sms_details 
group by DLR_status;

给出结果需要很长时间,并且一些查询给出 504 Gateway Time-out 错误,就像这个查询

SELECT sql_CALC_FOUND_ROWS Destination,count(*) as c 
FROM sms_details 
WHERE msg_timestamp >= UNIX_TIMESTAMP("2019-10-01") and msg_timestamp < UNIX_TIMESTAMP("2019-12-01") group by Destination;

我已经在我的表中使用分区 我尝试为某些列建立索引,但这对每天增加的数据造成了很大的问题。

那么以下方面的最佳做法是什么:

  • 加快执行时间
  • 关心插入速度

解决方法

收缩架构 -- 更小 --> 更少的 I/O --> 更快。​​

timestamp vs msg_timestamp -- 这些似乎是一样的,只是格式不同。其中,扔其中一个。

标准化 通过缩小数据量来加快插入速度。大部分 VARCHARs 可以由 2 字节 SMALLINT UNSIGNED 或 3 字节 MEDIUMINT UNSIGNED` 替换。

未来分区 -- 这样的分区不能超过一个; SELECTs 会浪费时间打开它们却什么也找不到。

分区过多 -- 在某个限制(可能是 50 个)下,有很多分区会减慢速度。

批处理是最好的加速方法。请参阅 LOAD DATAINSERT ... VALUES (...),(...),...。在后一种情况下,我建议批量处理 1000 行。 (超出这个范围会导致收益递减和可能存在一些限制。)如果数据来自多个来源,请解释;然后我们可以进一步讨论。

分区对于清除“旧”数据非常有用,因为 DROP PARTITIONDELETE 快得多。见http://mysql.rjweb.org/doc.php/partitionmaint

投掷 created_atupdated_at;他们可能没用。 (同样,越小越快。)

SQL_CALC_FOUND_ROWS 当您没有 LIMIT 时不需要;只需观察返回的行数。重新考虑用户对此类的需求。 (如果需要,请回来进行更多讨论。)

如果您有 INDEX(DLR_status)

DLR_status counts 将是完整的索引扫描。并考虑将该列设为 ENUM,使其只有 1 个字节。 (如果有多个值和/或越来越多的值,则“标准化”。)

查询 2 需要 INDEX(Destination,msg_timestmap)

它很大吗? 24M 行/5 年 --> 每秒不到 1 行。 100 行/秒是我开始担心“高速摄取”的地方。也就是说,我认为插入没有问题。另一方面,选择可能是一个问题。你给我们看了两个;让我们看看更多。我不想一次推荐一个索引;我宁愿设计一组索引来优化处理所有可能的查询。 特别是因为它可能涉及重新设计分区

汇总表是在“数据仓库”中进行快速分析的绝佳方式。见http://mysql.rjweb.org/doc.php/summarytables

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。