为什么我的 zip 文件没有从使用 python 的代码输出？

如何解决为什么我的 zip 文件没有从使用 python 的代码输出？

我想从这个网页中抓取所有文件，这些文件是 zip 文件：http://data.gdeltproject.org/events/index.html

这是我的代码：

from bs4 import BeautifulSoup as bs
import requests
import re

DOMAIN = "insert here"
URL = "insert here"

def get_soup(URL):
 return bs(requests.get(URL).text,'html.parser')


for link in get_soup(URL).findAll("a",attrs={'href': re.compile(".zip")}):
    file_link = link.get('href')
    print(file_link)

with open(link.text,'wb') as file:
response = requests.get(DOMAIN + file_link)
file.write(response.content)

代码似乎创建了一个文件，但是文件的内容是空的。我可以在 python 运行输出中看到所有 zip 文件，但它们不在文件中。有人可以帮我找出如何将这些文件导入我的计算机吗？我被困在这里了！

非常感谢你，莉莉

解决方法

from bs4 import BeautifulSoup
import httpx
import trio

mainurl = "http://data.gdeltproject.org/events/index.html"


async def downloader(rec):
    async with rec:
        async for client,link in rec:
            print(f'[*] Downloading --> {link}')
            async with await trio.open_file(link.split('/')[-1],'wb') as f:
                r = await client.get(link)
                await f.write(r.content)


async def main():
    async with httpx.AsyncClient(timeout=None) as client,trio.open_nursery() as nurse:
        r = await client.get(mainurl)
        soup = BeautifulSoup(r.text,'lxml')
        links = [mainurl[:36] + x['href'] for x in soup.select('a[href$=zip]')]

        sender,receiver = trio.open_memory_channel(0)

        async with receiver:
            for _ in range(3):
                nurse.start_soon(downloader,receiver.clone())

            async with sender:
                for link in links:
                    await sender.send([client,link])


if __name__ == "__main__":
    trio.run(main)

你能不能检查一下：

for link in get_soup(URL).findAll("a",attrs={'href': re.compile(".zip")}):
    file_link = link.get('href')
    print(file_link)

with open(link.text,'wb') as file:
response = requests.get(DOMAIN + file_link)
file.write(response.content)

会解决这个问题吗？

for link in get_soup(URL).findAll("a",attrs={'href': re.compile(".zip")}):
    file_link = link.get('href')
    print(file_link)

    with open(link.text,'wb') as file:
        response = requests.get(DOMAIN + file_link)
        file.write(response.content)

由于 Python 对 ident 非常明确，它可能会造成伤害（您的代码没有在 for 循环内运行，而是在此之后运行）。

我建议下次使用一些 GUI 调试器（看，在 VScode 或其他 IDE GUI 中设置有多容易）或使用 ipython 和 ipdb (import ipdb; ipdb.set_trace())

这不是完整的答案，因为如果您使用调试器并使用您的代码，您应该很容易克服它。感谢您进一步学习和坚持的工作:)