如何解决Python、并发和 asyncio:添加旋转代理的问题
我正在使用 asyncio 创建一个优化的多线程应用程序,并希望在组合中添加一个旋转代理。
从这篇杰出文章的样本开始:
Speed Up Your Python Program With Concurrency
我添加了一个旋转代理,但它停止工作。该代码在触摸代理的行后简单地退出该函数。
这个小代码片段有效,但如上面的屏幕截图所示添加到主脚本时无效。
import asyncio
import random as rnd
async def download_site():
proxy_list = [
('38.39.205.220:80'),('38.39.204.100:80'),('38.39.204.101:80'),('38.39.204.94:80')
]
await asyncio.sleep(1)
proxy = rnd.choice(proxy_list)
print(proxy)
asyncio.run(download_site())
这是完整的示例:
import asyncio
import time
import aiohttp
# Sample code taken from here:
# https://realpython.com/python-concurrency/#asyncio-version
# Info for adding headers for the proxy (Scroll toward the bottom)
# https://docs.aiohttp.org/en/stable/client_advanced.html
# Good read to possible improve performance on large lists of URLs
# https://asyncio.readthedocs.io/en/latest/webscraper.html
# RUN THIS METHOD TO SEE HOW IT WORKS.
# # Original Code (working...)
# async def download_site(session,url):
# async with session.get(url,proxy="http://proxy.com") as response:
# print("Read {0} from {1}".format(response.content_length,url))
def get_proxy(self):
proxy_list = [
(754,'38.39.205.220:80'),(681,'38.39.204.100:80'),(682,'38.39.204.101:80'),(678,'38.39.204.94:80')
]
proxy = random.choice(proxy_list)
print(proxy[1])
return proxy
async def download_site(session,url):
proxy_list = [
('38.39.205.220:80'),('38.39.204.94:80')
]
await asyncio.sleep(1)
proxy = rnd.choice(proxy_list)
print(proxy)
async with session.get(url,proxy="http://" + proxy) as response:
print("Read {0} from {1}".format(response.content_length,url))
async def download_all_sites(sites):
async with aiohttp.ClientSession() as session:
tasks = []
for url in sites:
task = asyncio.ensure_future(download_site(session,url))
tasks.append(task)
await asyncio.gather(*tasks,return_exceptions=True)
# Modified to loop thru only 1 URL to make debugging simple
if __name__ == "__main__":
sites = [
"https://www.jython.org",# "http://olympus.realpython.org/dice",] #* 80
start_time = time.time()
asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
duration = time.time() - start_time
print(f"Downloaded {len(sites)} sites in {duration} seconds")
感谢您提供的任何帮助。
解决方法
您使用 return_exceptions=True
但实际上并没有检查返回的结果是否有错误。您可以使用 asyncio.as_completed 处理异常并获取最早的下一个结果:
import asyncio
import random
import traceback
import aiohttp
URLS = ("https://stackoverflow.com",)
TIMEOUT = 5
PROXIES = (
"http://38.39.205.220:80","http://38.39.204.100:80","http://38.39.204.101:80","http://38.39.204.94:80",)
def get_proxy():
return random.choice(PROXIES)
async def download_site(session,url):
proxy = get_proxy()
print(f"Got proxy: {proxy}")
async with session.get(url,proxy=f"{proxy}",timeout=TIMEOUT) as resp:
print(f"{url}: {resp.status}")
return await resp.text()
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for url in URLS:
tasks.append(asyncio.create_task(download_site(session,url)))
for coro in asyncio.as_completed(tasks):
try:
html = await coro
except Exception:
traceback.print_exc()
else:
print(len(html))
if __name__ == "__main__":
asyncio.run(main())
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。