很抱歉问这个!我是新手所以请随时教我任何你知道的东西。 我正在为我的营销目的制作一个抓取工具,以便从网站上获取联系信息。我使用的是Python 3 这是我的代码:
import requests, bs4, os, codecs, csv
import pandas as pd
import sys
os.path.join('usr', 'bin', 'spam')
openFile = open('C:\\Users\\hdtra\\Desktop\\Test_1.csv',encoding='utf-8-sig')
read_test = csv.reader(openFile)
for link in read_test :
res = requests.get(link)
res.raise_for_status
facebookSpider = bs4.BeautifulSoup(res.text)
email = facebookSpider.select("._4-u2._3xaf._3-95._4-u8")
helloFile = open('C:\\Users\\hdtra\\Desktop\\In processing\\information.txt','w')
helloFile.write(str(email[3].encode('utf-8')) + '\n')
helloFile.close()
不知道为什么它会让我这样:
Traceback (most recent call last):
File "C:\Users\hdtra\Desktop\In processing\Facebook_spider.py", line 12, in <module>
res = requests.get(link)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 612, in send
adapter = self.get_adapter(url=request.url)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 703, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['http://www.facebook.com/D2Streetwear/?ref=br_rs']'
我知道get()
只获取字符串,但不知道如何将这些链接转换为字符串。这是我的cvs文件:
只有一列有5行:
http://www.facebook.com/D2Streetwear/?ref=br_rs
https://www.facebook.com/RealClothes/?ref=br_rs
https://www.facebook.com/Lecamelliaclothing/?ref=br_rs
https://www.facebook.com/TaTclothing-285844471884952/?ref=br_rs
https://www.facebook.com/Dai-Clothing-130675847640538/?ref=br_rs
我试图放str(link())
,但它不起作用。
答案 0 :(得分:1)
您应该理解csv.reader
返回迭代器,该迭代器遍历每一行以返回每列的列列表。
csv
。reader
(csvfile, dialect='excel', **fmtparams
)返回一个读取器对象,它将迭代给定的行
csvfile
。[...]
从csv文件中读取的每一行都将作为字符串列表返回。
大胆强调我的。您的CSV似乎包含一列,因此您可以使用link[0]
访问第一列。
with open('test.csv') as f:
r = csv.reader(f)
for row in r:
r = requests.get(row[0])
...
我认为在处理文件I / O时总是使用with...as
上下文管理器是一种好习惯,因为它会自动关闭文件并产生更清晰的代码。