微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Python CSV Writer仅写入处理的最后抓取的项目

如何解决Python CSV Writer仅写入处理的最后抓取的项目

所以我的刮板只将最后两个项目从它处理的最后一页发送到 csv。我不知道我哪里做错了它打印输出非常好。可能是经验组的眼睛能够提供帮助。

代码如下:

from requests_html import HTMLSession
import csv
import time


 def get_links(url):
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
         links.append(item.find('a',first=True).attrs['href'])

   # print(len(links))

    return links


 def get_product(link):
     _request = _session.get(link)

      title = _request.html.find('h2',first=True).full_text
      price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
      sku = _request.html.find('span.sku',first=True).full_text
      categories = _request.html.find('span.posted_in',first=True).full_text.replace('Categories:',"").strip()
      brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:',"").strip()
      #print(brand)

       product = {
         'Title': title,'Price': price,'SKU': sku,'Categories': categories,'Brand': brand
       }

    #print(product)
     return product


if __name__ == '__main__':
    for page in range(1,4):

        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
    
        if page == 1:
           parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'

       _session = HTMLSession()

        links = get_links(parse_url)
        results = []

        for link in links:
            results.append(get_product(link))
            time.sleep(1)
            #print(len(results))


with open('on_sale_bass.csv','w',newline='',encoding='utf-8') as csv_file:
    
    writer = csv.DictWriter(csv_file,fieldnames=results[0].keys())
    writer.writeheader()

    for row in results:
        writer.writerow(row)

当我尝试追加记录时,记录是用 csv 编写的,但标题在每次页面迭代时都重复。

解决方法

问题出在范围循环内的语句 results = [] 中。您在 results 循环的每次迭代中清空了 range(1,4)。因此,您只得到了上次迭代带来的东西。

请注意,我将 _session 设为 global,但在这种情况下,在我看来(随意更正)是合理的,只需通过它函数之间。现在,试试这个。

from requests_html import HTMLSession
import csv
import time


def get_links(url):
    global _session
    _request = _session.get(url)
    items = _request.html.find('li.product-grid-view.product.sale')
    links = []
    for item in items:
        links.append(item.find('a',first=True).attrs['href'])
    return links


def get_product(link):
    global _session
    _request = _session.get(link)
    title = _request.html.find('h2',first=True).full_text
    price = _request.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
    sku = _request.html.find('span.sku',first=True).full_text
    categories = _request.html.find('span.posted_in',first=True).full_text.replace('Categories:',"").strip()
    brand = _request.html.find('span.posted_in')[1].full_text.replace('Brand:',"").strip()
    product = {
        'Title': title,'Price': price,'SKU': sku,'Categories': categories,'Brand': brand
    }
    return product


if __name__ == '__main__':
    results = []
    for page in range(1,4):
        url = 'https://www.thebassplace.com/product-category/basses/4-string/'
        if page == 1:
            parse_url = url
        else:
            parse_url = f'https://www.thebassplace.com/product-category/basses/4-string/page/{page}/'
    
        _session = HTMLSession()
        links = get_links(parse_url)

        for link in links:
            product = get_product(link)
            results.append(product)
            #time.sleep(1)
            
    with open('on_sale_bass.csv','w',newline='',encoding='utf-8') as csv_file:
        writer = csv.DictWriter(csv_file,fieldnames=results[0].keys())
        writer.writeheader()
        for row in results:
            writer.writerow(row)

我得到的例子:

enter image description here

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。