我们正在开展的一个新项目需要大量的数据分析,但我们发现这非常慢,我们正在寻找改变软件和/或硬件方法的方法.
我们目前正在运行亚马逊ec2实例(linux):
High-cpu Extra Large Instance
7 GB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
API name: c1.xlarge
processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Xeon(R) cpu E5506 @ 2.13GHz
stepping : 5
cpu MHz : 2133.408
cache size : 4096 KB
MemTotal: 7347752 kB
MemFree: 728860 kB
Buffers: 40196 kB
Cached: 2833572 kB
SwapCached: 0 kB
Active: 5693656 kB
Inactive: 456904 kB
SwapTotal: 0 kB
SwapFree: 0 kB
MysqL> DESCRIBE articles_entities;
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| article_id | char(36) | NO | MUL | NULL | |
| entity_id | char(36) | NO | MUL | NULL | |
| created | datetime | YES | | NULL | |
| modified | datetime | YES | | NULL | |
| relevance | decimal(5,4) | YES | MUL | NULL | |
| analysers | text | YES | | NULL | |
| anchor | varchar(255) | NO | | NULL | |
+------------+--------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
从下表中可以看出,我们有很多协会以每天10万的速度增长
MysqL> SELECT count(*) FROM articles_entities;
+----------+
| count(*) |
+----------+
| 2829138 |
+----------+
1 row in set (0.00 sec)
像下面这样的简单查询花费了太多时间(12秒)
MysqL> SELECT count(*) FROM articles_entities WHERE relevance <= .4 AND relevance > 0;
+----------+
| count(*) |
+----------+
| 357190 |
+----------+
1 row in set (11.95 sec)
我们应该考虑什么来改善查询时间?不同的DB存储?不同的硬件.
解决方法:
正如mrorigo所说,请提供SHOW CREATE TABLE articles_entities,以便我们可以看到表的实际索引.
作为MysqL文档http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html的一个注释
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). MysqL cannot use an index if the columns do not form a leftmost prefix of the index
因此,如果相关性是多列索引的一部分,但不是该索引的最左列,则索引不会用于您的查询.
这是一个经常被忽视的常见问题.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 [email protected] 举报,一经查实,本站将立刻删除。