我试图用Python和BeautifulSoup编写一个用于抓取网站的脚本,然后将数据写入Excel表格。
它一直工作到写作部分,然后我得到NotImplementedError
?我查了一下,并用TRY:和Pass:blocks包围了代码的写入部分....它解决了Python解释器控制台窗口中的错误,但我的Excel工作表是空白的。
这是我到目前为止所做的:
import requests, openpyxl
from bs4 import BeautifulSoup
wb = openpyxl.Workbook('RDWM_CRM.xls')
wb.create_sheet('Phone')
sheet = wb.get_sheet_by_name('Phone')
# nav to webpage I want to scrape
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=New%20York%2C%20NY&page=2"
r = requests.get(url)
soup = BeautifulSoup(r.content)
# for loop finds info then prints
for div in soup.find_all("div", {"class": "info"}):
print (div.contents[0].text)
print (div.contents[1].text)
# for loop finds info then writes to excel cells
for div in soup.find_all("div", {"class": "info"}):
sheet['A1'] = div.contents[0].text
sheet['B1'] = div.contents[1].text
wb.save('RDWM_CRM.xls')
就像我上面所说,即使没有错误,我也得到了一张空白的Excel表格。这是在控制台中看到的回溯:
Neptune Construction
Serving the New York Area.(866) 664-1759
>>> # for loop finds info then writes to excel cells
... for div in soup.find_all("div", {"class": "info"}):
... sheet['A1'] = div.contents[0].text
... sheet['B1'] = div.contents[1].text
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\openpyxl\writer\write_only.py", line 223, in removed_method
raise NotImplementedError
NotImplementedError
>>> wb.save('RDWM_CRM.xls')
这是最后一段数据以及错误。
感谢您的帮助!!我仍然遇到excel表格空白......这是我正在使用的代码,没有错误....只是一张空白的Excel表格。它创建了一个名为Phone的新工作表,它只是空白......
import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=Seattle%2C%20WA&page=4" # nav to webpage I want to scrape
r = requests.get(url)
soup = BeautifulSoup(r.content)
# create a dummy list of texts to write to excel file
divs = []
wb = Workbook() # open new workbook, use load_workbook if existing
ws = wb.create_sheet('Phone')
for div in divs:
row = [div.contents[0].text, div.contents[1].text] # construct a row: shown only for example purposes
ws.append(row) # could use ws.append(div) since each div is a list
wb.save('RDWM_CRM.xlsx') # save workbook, will overwrite if exists
任何帮助表示赞赏!!
答案 0 :(得分:2)
如果我不完全理解您的问题,请提前道歉,但使用openpyxl似乎存在一些问题。
以下是如何使用openpyxl编写工作表的示例,可能会有所帮助:
from openpyxl import Workbook
# create a dummy list of texts to write to excel file
divs = [[chr(i)*8, chr(i+1)*8] for i in range(65, 75, 1)]
wb = Workbook() # open new workbook, use load_workbook if existing
ws = wb.create_sheet(title="Example")
for div in divs:
row = [div[0], div[1]] # construct a row: shown only for example purposes
ws.append(row) # could use ws.append(div) since each div is a list
wb.save('example.xlsx') # save workbook, will overwrite if exists
虚拟列表div看起来像这样:
[['AAAAAAAA', 'BBBBBBBB'],
['BBBBBBBB', 'CCCCCCCC'],
['CCCCCCCC', 'DDDDDDDD'],
['DDDDDDDD', 'EEEEEEEE'],
['EEEEEEEE', 'FFFFFFFF'],
['FFFFFFFF', 'GGGGGGGG'],
['GGGGGGGG', 'HHHHHHHH'],
['HHHHHHHH', 'IIIIIIII'],
['IIIIIIII', 'JJJJJJJJ'],
['JJJJJJJJ', 'KKKKKKKK']]
excel文件&#39; example.xlsx&#39;有这个工作表&#39;示例&#39;:
A B
1 AAAAAAAA BBBBBBBB
2 BBBBBBBB CCCCCCCC
3 CCCCCCCC DDDDDDDD
4 DDDDDDDD EEEEEEEE
5 EEEEEEEE FFFFFFFF
6 FFFFFFFF GGGGGGGG
7 GGGGGGGG HHHHHHHH
8 HHHHHHHH IIIIIIII
9 IIIIIIII JJJJJJJJ
10 JJJJJJJJ KKKKKKKK
你会构造一个像这样的行:
row = [div.contents[0].text, div.contents[1].text]
假设div.contents是正确的。希望这可以帮助。 PS。我使用的是openpyxl版本2.3.0