使用asyncio和dnspython异步发送DNS查询

时间:2015-06-05 20:12:08

标签: python dns coroutine python-asyncio dnspython

假设我想使用dnspython为Alexa排名前1M的站点发送各种类型的DNS查询(A,AAAA,NS,SOA,DNSKEY,NSEC3,DS等)。

逐个执行此操作会花费一些时间,因为我会为每个网站发送多个查询。所以我希望在python 3中使用asyncio执行一些并行操作。

我经历了David的生成器/协程三部曲,http://www.dabeaz.com/talks.html,但我还不确定如何完成我的简单任务...

更具体地说,

results = dns.resolver.query('google.com','AAAA')

是一个阻塞函数调用,等待DNS回复返回。

如何在不使用线程的情况下在此等待时间内发送其他查询?由于DNS查询通常是udp数据包,我认为asyncio可能有所帮助。

pycares似乎不支持我需要的所有记录类型,所以pycares和aiodns对我的情况不起作用。

任何参考和想法都会有所帮助。

1 个答案:

答案 0 :(得分:1)

最近,DNSPython 现在有了原生的 AsyncIO 支持,尽管 documentation is a little lacking

尽管如此,现在可以使用 dnspython 进行查询,而无需 hacky 线程池解决方案。

AsyncIO DNSPython 示例

以下是使用 dnspython 的 AsyncIO Resolver 类使用包装函数和 asyncio.gather 进行高效批量查询的简单示例:

from dns.asyncresolver import Resolver
import dns.resolver
import dns.rrset
import asyncio
from typing import Tuple


async def dns_query(domain: str, rtype: str = 'A', **kwargs) -> dns.rrset.RRset:
    kwargs, res_cfg = dict(kwargs), {}
    # extract 'filename' and 'configure' from kwargs if they're present
    # to be passed to Resolver. we pop them to avoid conflicts passing kwargs
    # to .resolve().
    if 'filename' in kwargs: res_cfg['filename'] = kwargs.pop('filename')
    if 'configure' in kwargs: res_cfg['configure'] = kwargs.pop('configure')

    # create an asyncio Resolver instance
    rs = Resolver(**res_cfg)

    # call and asynchronously await .resolve() to obtain the DNS results
    res: dns.resolver.Answer = await rs.resolve(domain, rdtype=rtype, **kwargs)

    # we return the most useful part of Answer: the RRset, which contains
    # the individual records that were found.
    return res.rrset


async def dns_bulk(*queries: Tuple[str, str], **kwargs):
    ret_ex = kwargs.pop('return_exceptions', True)

    # Iterate over the queries and call (but don't await) the dns_query coroutine
    # with each query.
    # Without 'await', they won't properly execute until we await the coroutines
    # either individually, or in bulk using asyncio.gather
    coros = [dns_query(dom, rt, **kwargs) for dom, rt in list(queries)]

    # using asyncio.gather, we can effectively run all of the coroutines
    # in 'coros' at the same time, instead of awaiting them one-by-one.
    #
    # return_exceptions controls whether gather() should immediately
    # fail and re-raise as soon as it detects an exception,
    # or whether it should just capture any exceptions, and simply
    # return them within the results.
    #
    # in this example function, return_exceptions is set to True,
    # which means if one or more of the queries fail, it'll simply
    # store the exceptions and continue running the remaining coros,
    # and return the exceptions inside of the tuple/list of results.
    return await asyncio.gather(*coros, return_exceptions=ret_ex)


async def main():
    queries = [
        ('privex.io', 'AAAA'),
        ('privex.io', 'TXT'),
        ('google.com', 'A'),
        ('google.com', 'AAAA'),
        ('examplesitedoesnotexist.test', 'A'),
    ]
    print(f"\n [...] Sending {len(queries)} bulk queries\n")
    res = await dns_bulk(*queries)
    print(f"\n [+++] Got {len(res)} results! :)\n\n")

    for i, a in enumerate(res):
        print("\n------------------------------------------------------------\n")
        if isinstance(a, Exception):
            print(f" [!!!] Error: Result {i} is an exception! Original query: {queries[i]} || Exception is: {type(a)} - {a!s} \n")
            continue
        print(f" [+++] Got result for query {i} ( {queries[i]} )\n")
        print(f"  >>>  Representation: {a!r}")
        print(f"  >>>  As string:")
        print(f"    {a!s}")
        print()
    print("\n------------------------------------------------------------\n")

asyncio.run(main())

当您运行上述脚本时,输出如下所示:


 [...] Sending 5 bulk queries


 [+++] Got 5 results! :)



------------------------------------------------------------

 [+++] Got result for query 0 ( ('privex.io', 'AAAA') )

  >>>  Representation: <DNS privex.io. IN AAAA RRset: [<2a07:e00::abc>]>
  >>>  As string:
    privex.io. 221 IN AAAA 2a07:e00::abc


------------------------------------------------------------

 [+++] Got result for query 1 ( ('privex.io', 'TXT') )

  >>>  Representation: <DNS privex.io. IN TXT RRset: [<"v=spf1 include:spf.messagingengine.com include:smtp.privex.io -all">, <"google-site-verification=_0OlLdacq3GAc4NkhOd0pBcLsNya3KApS0iAc6MtbYU">]>
  >>>  As string:
    privex.io. 300 IN TXT "v=spf1 include:spf.messagingengine.com include:smtp.privex.io -all"
privex.io. 300 IN TXT "google-site-verification=_0OlLdacq3GAc4NkhOd0pBcLsNya3KApS0iAc6MtbYU"


------------------------------------------------------------

 [+++] Got result for query 2 ( ('google.com', 'A') )

  >>>  Representation: <DNS google.com. IN A RRset: [<216.58.205.46>]>
  >>>  As string:
    google.com. 143 IN A 216.58.205.46


------------------------------------------------------------

 [+++] Got result for query 3 ( ('google.com', 'AAAA') )

  >>>  Representation: <DNS google.com. IN AAAA RRset: [<2a00:1450:4009:80f::200e>]>
  >>>  As string:
    google.com. 221 IN AAAA 2a00:1450:4009:80f::200e


------------------------------------------------------------

 [!!!] Error: Result 4 is an exception! Original query: ('examplesitedoesnotexist.test', 'A') || Exception is: <class 'dns.resolver.NXDOMAIN'> - The DNS query name does not exist: examplesitedoesnotexist.test. 


------------------------------------------------------------

使用 AsyncIO 的后台任务

假设您的应用程序纯粹是 AsyncIO,那么可以在后台运行协程,而无需线程:

import asyncio

async def hello():
    for i in range(10):
        print("hello world")
        await asyncio.sleep(2.0)

async def lorem():
    for i in range(20):
        print("lorem ipsum dolor")
        await asyncio.sleep(1.0)


async def my_app():
    print(" [...] creating tsk_hello and tsk_lorem")
    tsk_hello = asyncio.create_task(hello())
    tsk_lorem = asyncio.create_task(lorem())

    # let them both run for 5 seconds
    print(" [...] waiting 5 secs")
    await asyncio.sleep(5.0)

    # now, assuming you wanted to cancel a looping task before it's finished
    # (or tasks that are endless 'while True' loops)
    # we can use the tsk_x task objects to ask them to stop immediately.
    print(" [...] stopping tsk_hello")
    tsk_hello.cancel()
    print(" [...] waiting 4 secs")
    await asyncio.sleep(4.0)
    print(" [...] stopping tsk_lorem")
    tsk_lorem.cancel()

asyncio.run(my_app())

如果您为 AsyncIO 后台任务运行上面的示例代码,输出将如下所示,表明 loremhello 能够并行运行,以及主入口函数:

 [...] waiting 5 secs
hello world
lorem ipsum dolor
lorem ipsum dolor
hello world
lorem ipsum dolor
lorem ipsum dolor
hello world
lorem ipsum dolor
 [...] stopping tsk_hello
 [...] waiting 4 secs
lorem ipsum dolor
lorem ipsum dolor
lorem ipsum dolor
lorem ipsum dolor
 [...] stopping tsk_lorem