发现之前写的一个 Python 脚本执行时的网络 IO 阻塞时间占比太高了,所以准备进行重构,用 aiohttp 代替 requests。
测试时出现异常:ValueError: Only http proxies are supported
查阅 aiohttp 文档 可知:
aiohttp supports plain HTTP proxies and HTTP proxies that can be upgraded to HTTPS via the HTTP CONNECT method. aiohttp does not support proxies that must be connected to via
https://
.
aiohttp 支持纯 HTTP 代理和可以通过 HTTP CONNECT 方法升级到 HTTPS 的 HTTP 代理,不支持必须通过 https://
连接的代理。
虽然 requests 配合 run_in_executor 也可以,不过还是想看看能不能解决这个问题。
搜索后发现已经有现成的轮子可以用了,aiohttp-socks,其中实现的 ProxyConnector 继承了 aiohttp 的 TCPConnector。
import asyncio
import aiohttp
from aiohttp_socks import ProxyConnector
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main(url):
connector = ProxyConnector.from_url('socks5://127.0.0.1:7890')
async with aiohttp.ClientSession(connector=connector, headers=headers) as session:
html = await fetch(session, url)
# do something with html
pass
def execute():
urls = ...
loop = asyncio.get_event_loop()
tasks = [main(url) for url in urls]
loop.run_until_complete(asyncio.wait(tasks))
指定 connector 即可。