我正在尝试构建一个机器人,它将废弃购买域名的购买历史记录。到目前为止,我能够从csv文件中提取域并将它们存储到列表中(PS:有10k个域)。当我试图废弃网站时,问题就出现了。我试过用两个域做这个,它完美无缺。有谁知道这是什么错误以及如何解决它?非常感谢你提前。
我的代码:
datafile = open('/Users/.../Documents/Domains.csv', 'r')
myreader = csv.reader(datafile, delimiter=";",)
domains = []
for row in myreader:
domains.append(row[1])
del domains[0]
print("The Domains have been stored into a list")
nmb_sells_record = 0
def result_catcher(domains,queue):
template_url = "https://namebio.com/{}".format(domain)
get = requests.get(template_url)
results = get.text
last_sold = results[results.index("last sold for ")+15:results.index(" on 2")].replace(",","")
last_sold = int(last_sold)
if not "No historical sales found." in results:
sold_history = results[results.index("<span class=\"label label-success\">"):results.index(" USD</span> on <span class=\"label")]
queue.put(results)
#domains = ["chosen.com","koalas.com"]
queues = {}
nmb=0
for x in range(len(domains)):
new_queue = "queue{}".format(nmb)
queues[new_queue] = queue.Queue()
nmb += 1
count = 0
for domain in domains:
for queue in queues:
t = threading.Thread(target=result_catcher, args=(domain,queues[queue]))
t.start()
print("The Requests were all sent, now they are beeing analysed")
for queue in queues:
response_domain = queues[queue].get()
nmb_sells_record = response_domain.count("for $") + response_domain.count("USD")
print("The Bot has recorded {} domain sells".format(nmb_sells_record))
我的代码输出:
Exception in thread Thread-345:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 743, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x115a55a20>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
答案 0 :(得分:1)
来自python docs:
exception socket.gaierror OSError的子类,getaddrinfo()和getnameinfo()为与地址相关的错误引发了这个异常。
伴随值是表示错误的一对(错误,字符串) 库调用返回。 string表示对的描述 错误,由gai_strerror()C函数返回。数字错误 value将匹配此模块中定义的一个EAI_ *常量。
gai =&gt;获取地址信息
新异常:NewConnectionError ,在我们无法建立新连接时引发,通常是ECONNREFUSED套接字错误。
ECONNREFUSED错误here的一些可能原因以及一些用于探测地址和端口的命令行命令。
顺便说一句,不是将所有行都读入数组,而是删除数组中的第一项,这使得python将所有其他项目滑动到一个点上,您可以更有效地跳过标题(?),就像这样:
myreader = csv.reader(datafile, delimiter=";",)
next(my_reader) #<==== HERE ****
domains = []
for row in myreader:
domains.append(row[1])
如果没有下一行, next()
将抛出StopIteration异常。如果你想阻止它,你可以调用next(my_reader, None)
,如果没有下一行,它将返回None。
线程示例:
import requests
import threading
resources = [
"dfactory.com",
"dog.com",
"cat.com",
]
def result_catcher(resource):
template_url = "https://namebio.com/{}".format(resource)
get = requests.get(template_url)
threads = []
for resource in resources:
t = threading.Thread(target=result_catcher, args=(resource,) )
t.start()
threads.append(t)
for thread in threads:
thread.join()
print("All threads done executing.")
顺便说一句,将有一个最佳线程数要启动,小于N.创建一个线程池,当一个线程完成后,它返回并从工作队列中读取另一个资源路径。您必须运行一些测试来确定最佳线程数。创建10,000个线程并不是最佳选择。如果你有四个核心,那么最少只有10个线程。