MySQL 分组依据和计数性能

如何解决MySQL 分组依据和计数性能

我有两张桌子

导出

Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(150)   utf8_unicode_ci                No     None  
city_id int(11)         utf8_unicode_ci                Yes    NULL

索引

              Type   Unique Packed  Column  Cardinality Collation     Null
id            BTREE  Yes        No      id      769169      A           No
email_index   BTREE  Yes        No     email    769169      A           No
city_id_index BTREE  No         No  city_id.      6356      A          Yes

出口历史

Name     Type           Collation        Attributes    Null   Default   Extra
id       int(10)        utf8_unicode_ci  UNSIGNED      No     None      AUTO_INCREMENT
email    varchar(255)   utf8_unicode_ci                No     None

索引

            Type    Unique  Packed  Column  Cardinality Collation   Null
id          BTREE   Yes     No      id      113887      A           No
email_index BTREE   No      No      email   113887      A           No

我需要获得最多电子邮件（用户）的顶级城市 ID。还有 export_history 表。我需要从结果中排除电子邮件。

结束查询看起来像

主要查询

SELECT COUNT(city_id) as city_count,city_id
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            ) 
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

执行时间约为 7 秒。问题是执行需要那么多时间。

说明显示：

id select_type       table     type   possible_keys  key            key_len  ref     rows    Extra
1 PRIMARY            e         index  NULL           city_id_index  5        NULL    769169  Using where; Using temporary; Using filesort
2 DEPENDENT SUBQUERY ehistory  ref    email_index    email_index    767      e.email 1     Using where; Using index

请注意，这两个查询的运行速度非常快 > 0.01 秒

查询 1

SELECT COUNT(city_id) as city_count,city_id
    FROM export
        GROUP BY city_id
            ORDER BY city_count DESC
                   LIMIT 5

执行时间约为 0.1 秒

查询 2

SELECT *
    FROM export e
        WHERE NOT EXISTS (
            SELECT * FROM export_history ehistory
                WHERE e.email = ehistory.email
            )

执行时间为 ~0.02 秒

您能否提出任何改进主查询性能的建议？

解决方法

您可以通过对依赖子查询使用 LEFT JOIN ... IS NULL 而不是 NOT EXISTS 来简化您的查询。它可能（也可能不会：尝试）通过避免重复依赖子查询来加快速度。

SELECT COUNT(e.city_id) as city_count,e.city_id
  FROM export e
  LEFT JOIN export_history ehistory ON e.email = ehistory.email
 WHERE ehistory.id IS NULL
 GROUP BY e.city_id
 ORDER BY COUNT(e.city_id) DESC
 LIMIT 5;

试试这个复合索引。

CREATE INDEX exp_email_cityid ON export(email,city_id);

如果这没有帮助，请尝试以相反顺序使用列的索引：

CREATE INDEX exp_cityid_email ON export(city_id,email);

专业提示：单列索引与为匹配查询中的过滤条件而创建的多列索引的作用不同。

MySQL 分组依据和计数性能

如何解决MySQL 分组依据和计数性能

解决方法

相关推荐