Async `requests` in Python : Do async I/O on a blocking object
After the introduction of the send
method of generators in Python 2.5, it was possible to send values into generators, essentially making them a two-way object. This leads to the possibility of single-threaded coroutine-based co-operative multitasking. Before 3.5, one could create a coroutine by the asyncio.coroutine
(or types.coroutine
) decorator, and pause a coroutine (when doing I/O) by using yield from
which basically is a syntactic sugar for yield
-ing values from another generator (using a for
loop, for example).
In Python 3.5, the async
-await
keywords were introduced to support the Async operations natively, with:
- The
async
keyword replaces the need for an explicit decorator to create a coroutine - The
await
is just a syntactic sugar foryield from
Concurrent I/O is a huge performance boost for I/O based programs (as multiple threads can't help even in a multicore system because of the GIL (Global Interpreter Lock)), and spawning multiple processes on a I/O based program would be costly.
To do concurrent I/O, the program needs to be written as such. Not all programs are written with that in mind but we can make (most of) them Async with the cost of spwaning threads to run them. We'll see how to do that in a moment.
Here, we're gonna use the synchronous requests
library to get HTTP resources. requests
doesn't support Async I/O by default.
Let's see the synchronous operations first with the timings.
I'm gonna use Python 3.7 here, but the examples should work on 3.6 as well (and 3.5 if we replace the f-strings with regular string formatting).
We start with a list
of URLs we're gonna send GET
request to:
urls = [ 'https://heemayl.net', 'https://dealiable.com', 'https://example.net', 'https://www.w3.org', 'https://httpbin.org' ]
Now, let's define a simple (synchronous) function to send GET
requests to a list of URLs, and track the timings:
def requests_sync(urls): for url in urls: print(f'Start: {url}: {time.time()}') response = requests.get(url) print(f'End: {url}: {response.status_code}: {time.time()}') return None
Let's run it now:
>>> request_sync(urls) Start: https://heemayl.net: 1552823647.6991084 End: https://heemayl.net: 200: 1552823649.6029341 Start: https://dealiable.com: 1552823649.6030045 End: https://dealiable.com: 200: 1552823650.8961256 Start: https://example.net: 1552823650.896201 End: https://example.net: 200: 1552823652.9742491 Start: https://www.w3.org: 1552823652.9743247 End: https://www.w3.org: 200: 1552823655.355746 Start: https://httpbin.org: 1552823655.355821 End: https://httpbin.org: 200: 1552823657.973544
Findings:
- Each iterated URL is
requests.get
-ed sequentially - Once we get the response back from one, we move onto the next
- As
requests.get
is blocking on network I/O, the program seems to be stalled during the duration of the blocked time
The above is not a performant solution as one can imagine, as that does not use network efficiently, and also spends most of the time blocking on network I/O. A far better solution would be to be able to send the requests (seemingly) parallelly, without waiting for the previous one to get the response. In this manner we have the following advantages:
- The network is properly used
- The scheduler can do other tasks while waiting on I/O
- No extra overhead (main overhead is the context switching of coroutines/tasks)
To make the operations Async, we need to leverage individual threads (as a penalty) to run each requests.get
function.
Let's define a thread pool executor first where we'll send the requests.get
functions to run:
# As we have 5 URLs, 5 threads would do executor = concurrent.futures.ThreadPoolExecutor(max_workers=5)
We also need the event loop that will orchestrate/manage the coroutines/tasks/futures:
loop = asyncio.get_event_loop()
Now we get to the base of the idea, where we'll define a async
function that will return the await
-ed value of the requests.get
function run on a URL, and the whole thing is driven by the input event loop after sending the function onto a thread pool executor. Phew!
All of the above is basically done here:
await loop.run_in_executor(executor, request.get, url)
Let's see the whole function in action:
async def individual_request(loop, executor, url): print(f'Start: {url}: {time.time()}') response = await loop.run_in_executor(executor, requests.get, url) print(f'End: {url}: {response.status_code}: {time.time()}') return response
Okay! Now we need to define a async
function that will gather
the individual_request(url)
coroutines for the input URLs, and run them concurrently:
async def main(urls): # Pass in the loop, and executor to `individual_request` coroutines = [individual_request(loop, executor, url) for url in urls] # `asyncio.gather` takes the coroutines/futures as arguments, # schedule them in the event loop, and aggregates the return # values as a future which results in a list eventually results = await asyncio.gather(*coroutines) return results
Now, let's run this in the event loop, and see the timings:
>>> loop.run_until_complete(main(urls)) Start: https://heemayl.net: 1552833809.0163138 Start: https://dealiable.com: 1552833809.0164335 Start: https://example.net: 1552833809.016731 Start: https://www.w3.org: 1552833809.017046 Start: https://httpbin.org: 1552833809.0171287 End: https://dealiable.com: 200: 1552833809.9785948 End: https://example.net: 200: 1552833810.5301158 End: https://httpbin.org: 200: 1552833810.7631378 End: https://www.w3.org: 200: 1552833810.8200846 End: https://heemayl.net: 200: 1552833811.72682
As we can see, all requests.get
were fired off concurrently, and the responses came back in their arrival order.
We can also schedule some computational tasks when the coroutines are waiting on I/O as one can imagine.
Just to note, here's the content of the results
list (expectedly):
[ <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]> ]
Finally, I would suggest to go through the documentations of the modules and functions used here to get a better idea:
- https://docs.python.org/3/library/asyncio.html
- https://docs.python.org/3/library/concurrent.futures.html
Comments
Comments powered by Disqus