Question

我正在尝试构建一个机器人，它将废弃购买域名的购买历史记录。到目前为止，我能够从csv文件中提取域并将它们存储到列表中（PS：有10k个域）。当我试图废弃网站时，问题就出现了。我试过用两个域做这个，它完美无缺。有谁知道这是什么错误以及如何解决它？非常感谢你提前。

我的代码：

datafile = open('/Users/.../Documents/Domains.csv', 'r')
myreader = csv.reader(datafile, delimiter=";",)
domains   = []
for row in myreader:
    domains.append(row[1])
del domains[0]
print("The Domains have been stored into a list")

nmb_sells_record = 0

def result_catcher(domains,queue):
    template_url = "https://namebio.com/{}".format(domain)
    get = requests.get(template_url)
    results = get.text
    last_sold =  results[results.index("last sold for ")+15:results.index(" on 2")].replace(",","")
    last_sold = int(last_sold)
    if not "No historical sales found." in results:
        sold_history = results[results.index("<span class=\"label label-success\">"):results.index(" USD</span> on <span class=\"label")]
    queue.put(results)

#domains = ["chosen.com","koalas.com"]
queues = {}
nmb=0
for x in range(len(domains)):
    new_queue = "queue{}".format(nmb)
    queues[new_queue] = queue.Queue()
    nmb += 1
count = 0
for domain in domains:
    for queue in queues: 
        t = threading.Thread(target=result_catcher, args=(domain,queues[queue]))
        t.start()
print("The Requests were all sent, now they are beeing analysed")   
for queue in queues:
    response_domain = queues[queue].get()
    nmb_sells_record = response_domain.count("for $") + response_domain.count("USD")


print("The Bot has recorded {} domain sells".format(nmb_sells_record))

我的代码输出：

Exception in thread Thread-345:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 743, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
    conn = self._new_conn()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x115a55a20>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

Answer 1

来自python docs：

exception socket.gaierror OSError的子类，getaddrinfo（）和getnameinfo（）为与地址相关的错误引发了这个异常。

伴随值是表示错误的一对（错误，字符串）   库调用返回。 string表示对的描述   错误，由gai_strerror（）C函数返回。数字错误   value将匹配此模块中定义的一个EAI_ *常量。

gai =＆gt;获取地址信息

来自urllib3 wikipage：

新异常：NewConnectionError ，在我们无法建立新连接时引发，通常是ECONNREFUSED套接字错误。

ECONNREFUSED错误here的一些可能原因以及一些用于探测地址和端口的命令行命令。

顺便说一句，不是将所有行都读入数组，而是删除数组中的第一项，这使得python将所有其他项目滑动到一个点上，您可以更有效地跳过标题（？），就像这样：

myreader = csv.reader(datafile, delimiter=";",)
next(my_reader)  #<==== HERE ****

domains   = []

for row in myreader:
    domains.append(row[1])

如果没有下一行，

next()将抛出StopIteration异常。如果你想阻止它，你可以调用next(my_reader, None)，如果没有下一行，它将返回None。

线程示例：

import requests
import threading

resources = [
    "dfactory.com",
    "dog.com",
    "cat.com",
]

def result_catcher(resource):
    template_url = "https://namebio.com/{}".format(resource)
    get = requests.get(template_url)


threads = []

for resource in resources:
    t = threading.Thread(target=result_catcher, args=(resource,) )
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

print("All threads done executing.")

顺便说一句，将有一个最佳线程数要启动，小于N.创建一个线程池，当一个线程完成后，它返回并从工作队列中读取另一个资源路径。您必须运行一些测试来确定最佳线程数。创建10,000个线程并不是最佳选择。如果你有四个核心，那么最少只有10个线程。

使用线程刮取网站时出错

1 个答案: