微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

列表理解可以帮助遍历sqlalchemy查询返回吗?

如何解决列表理解可以帮助遍历sqlalchemy查询返回吗?

这是一个非常慢的循环(使用tqdm对其进行测量约为1.5 [it / s]

对于上下文,对象是指本地的flask-sqlAlchemy管理的postgres数据库的模型。即:网络传输速度不是速度慢的原因。

<div class="grid-container">
  <div class="item">
    <p>It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters,as opposed to using 'Content here,content here',making it look like readable English. Many desktop publishing packages and web page editors Now use Lorem Ipsum as their default model text,and a search for 'lorem ipsum' will uncover many web sites still in their infancy. VarIoUs versions have evolved over the years,sometimes by accident,sometimes on purpose (injected humour and the like).

It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters,sometimes on purpose (injected humour and the like).

</p>
     
  </div>
  <div class="item">
    
    <div class="text">
        <span>Web App</span>
    </div>
     <div class="text">
        <h1>Sticky dive</h1>
    </div>
    <div class="text">
        <h6>Title One</h6>
        <span>Hello - <i>X</i></span>
    </div>
  <div class="text">
        <h6>Title One</h6>
        <span>Hello - <i>X</i></span>
    </div>
   
    
    
    
  </div>
</div>

进一步明确:大约有50万本书,作者约为5万。每本书可以由多位作者撰写。

我没有返回列表,但是我敢肯定,这可以得到改善-列表理解实际上可以改善这一点吗?

类似...

for author in tqdm(authors):
    new_score = 0
    for book in author.maintitles:
        new_score = new_score + book.score
        author.score = new_score

解决方法

否,don't use a list comprehension for side effects。即使您要使用该列表,comprehensions are only slightly faster than for-loops anyway

但是,您可以使用类似的生成器表达式来改进代码。

第1步:最后分配给author.score而不是每个循环,并使用扩充分配。

for author in tqdm(authors):
    new_score = 0
    for book in author.maintitles:
        new_score += book.score
    author.score = new_score

第2步:现在很明显new_score是一个简单的求和,所以改用sum

for author in tqdm(authors):
    author.score = sum(book.score for book in author.maintitles)

旁注:您也可以使用列表理解功能来编写此代码,但这将使它构建列表,然后对其求和,而生成器表达式则更高效,因为它随其进行求和。

sum([book.score for book in author.maintitles])
,

由于提供的重构仅证明列表理解不是解决方案-自此我发现了问题的根本原因,因此我添加了以下内容作为答案。

上面的代码段是对返回的list进行的query上的操作的一部分-如前所述,迭代了重复数据删除的authors列表(约5万作者)最终操作是以15 it / s的速度运行15小时:

    # Make the popular books query
    popular_books = \
        db.session.query(Book).filter(Book.score > 0).all()
    
    # Make a list of all authors for each book returned in the query
    authors = []
    for book in popular_books:
        authors = authors + book.mainauthors
    
    # Remove duplicates using set()
    authors = list(set(authors))
    

    for author in tqdm(authors):
        author.score = sum(book.score for book in author.maintitles)
    db.session.commit()

只需通过调整查询以通过joinedload返回作者并使用.distinct()处理重复数据删除,我们不仅将上面的所有内容简化为几行,而且操作在

    for popular_author in db.session.query(Author).join(Book,Author.maintitles).options(db.joinedload(Book,Artist.maintitles)).filter(Book.popularity > 0).distinct().all():
        popular_author.score = sum(book.score for book in popular_author.maintitles)

但是,我仍然不完全确定该方法如何比旧版本快几个数量级。两者都以相同的方式遍历authors的列表并执行相同的简单求和操作。

作为参考,在此过程之后提交会话大约需要2:00小时,而以前的实现要快得多。总体而言仍是惊人的(7.5倍)改进。我的猜测是,从一开始就使用更优化的query,所有返回的ORM对象都放置在RAM中,并且操作起来更快。在list上引入python query方法似乎会破坏它/破坏内存中的ORM。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。