使用带有嵌套查询的python MySQLDB SScursor

当生成大型结果集时,典型的 MySQLdb库查询可能会占用大量内存并在 Python中表现不佳.例如：

cursor.execute("SELECT id,name FROM `table`")
for i in xrange(cursor.rowcount):
    id,name = cursor.fetchone()
    print id,name

有一个可选的游标,一次只能获取一行,真正加快了脚本的速度,并大大减少了脚本的内存占用.

import MysqLdb
import MysqLdb.cursors

conn = MysqLdb.connect(user="user",passwd="password",db="dbname",cursorclass = MysqLdb.cursors.SSCursor)
cur = conn.cursor()
cur.execute("SELECT id,name FROM users")
row = cur.fetchone()
while row is not None:
    doSomething()
    row = cur.fetchone()    
cur.close()
conn.close()

但我找不到任何关于使用SSCursor与嵌套查询的内容.如果这是doSomething()的定义：

def doSomething()
    cur2 = conn.cursor()
    cur2.execute('select id,x,y from table2')
    rows = cur2.fetchall()
    for row in rows:
        doSomethingElse(row)
    cur2.close()

然后脚本抛出以下错误：

_MysqL_exceptions.ProgrammingError: (2014,"Commands out of sync; you can't run this command Now")

听起来好像SSCursor与嵌套查询不兼容.真的吗？如果是这样太糟糕了,因为主循环似乎用标准光标运行得太慢了.

解决方法

在MysqLdb用户指南中,在 the threadsafety attribute(强调我的)标题下讨论了这个问题：

The MysqL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MysqLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
MysqL_store_result()),it is complicated by SSCursor (which uses
MysqL_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed.

MySLQ C API函数 mysql_use_result()的文档提供了有关错误消息的更多信息：

When using MysqL_use_result(),you must execute MysqL_fetch_row()
until a NULL value is returned,otherwise,the unfetched rows are
returned as part of the result set for your next query. The C API
gives the error Commands out of sync; you can't run this command Now
if you forget to do this!

换句话说,在通过同一连接执行另一个语句之前,必须从任何未缓冲的游标(即使用MysqL_use_result()而不是MysqL_store_result() – 使用MysqLdb,这意味着SSCursor和SSDictCursor)完全获取结果集.

在您的情况下,最直接的解决方案是在迭代未缓冲查询的结果集时打开第二个连接. (从同一连接中简单地获取缓冲光标是不行的;在使用缓冲光标之前,您仍然需要超过无缓冲的结果集.)

如果您的工作流程类似于“遍历大结果集,对每行执行N个小查询”,请考虑将MysqL的存储过程作为嵌套来自不同连接的游标的替代方法.您仍然可以使用MysqLdb来调用该过程并获得结果,但是您肯定想要read the documentation of MySQLdb’s callproc() method,因为它在检索过程输出时不符合Python的database API specs.

第二种方法是坚持使用缓冲光标,但将查询拆分为批量.这就是我去年为一个项目做的事情,我需要遍历一组数百万行,用内部模块解析一些数据,并在处理完每一行后执行一些INSERT和UPDATE查询.一般的想法看起来像这样：

QUERY = r"SELECT id,name FROM `table` WHERE id BETWEEN %s and %s;"
BATCH_SIZE = 5000

i = 0
while True:
    cursor.execute(QUERY,(i + 1,i + BATCH_SIZE))
    result = cursor.fetchall()

    # If there's no possibility of a gap as large as BATCH_SIZE in your table ids,# you can test to break out of the loop like this (otherwise,adjust accordingly):
    if not result:
        break

    for row in result:
        doSomething()

    i += BATCH_SIZE

关于你的示例代码,我要注意的另一件事是你可以直接在MysqLdb中的游标上迭代而不是在xrange(cursor.rowcount)上显式调用fetchone().这在使用无缓冲游标时尤其重要,因为rowcount属性未定义并且会产生非常意外的结果(请参阅：Python MysqlDB using cursor.rowcount with SSDictCursor returning wrong count).

使用带有嵌套查询的python MySQLDB SScursor

解决方法

相关推荐