Python BeautifulSoup Scrape数据写入Excel" NotImplementedError"

时间:2015-12-22 23:33:49

标签: python excel beautifulsoup

我试图用Python和BeautifulSoup编写一个用于抓取网站的脚本,然后将数据写入Excel表格。

它一直工作到写作部分,然后我得到NotImplementedError?我查了一下,并用TRY:和Pass:blocks包围了代码的写入部分....它解决了Python解释器控制台窗口中的错误,但我的Excel工作表是空白的。

这是我到目前为止所做的:

import requests, openpyxl
from bs4 import BeautifulSoup

wb = openpyxl.Workbook('RDWM_CRM.xls')
wb.create_sheet('Phone')
sheet = wb.get_sheet_by_name('Phone')

# nav to webpage I want to scrape
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=New%20York%2C%20NY&page=2"
r = requests.get(url)
soup = BeautifulSoup(r.content)

# for loop finds info then prints
for div in soup.find_all("div", {"class": "info"}):
    print (div.contents[0].text)
    print (div.contents[1].text)            

# for loop finds info then writes to excel cells
for div in soup.find_all("div", {"class": "info"}):
    sheet['A1'] = div.contents[0].text
    sheet['B1'] = div.contents[1].text

wb.save('RDWM_CRM.xls')

就像我上面所说,即使没有错误,我也得到了一张空白的Excel表格。这是在控制台中看到的回溯:

Neptune Construction
Serving the New York Area.(866) 664-1759
>>> # for loop finds info then writes to excel cells
... for div in soup.find_all("div", {"class": "info"}):
...     sheet['A1'] = div.contents[0].text
...     sheet['B1'] = div.contents[1].text
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\openpyxl\writer\write_only.py", line 223, in removed_method
raise NotImplementedError
NotImplementedError
>>> wb.save('RDWM_CRM.xls')

这是最后一段数据以及错误。

感谢您的帮助!!我仍然遇到excel表格空白......这是我正在使用的代码,没有错误....只是一张空白的Excel表格。它创建了一个名为Phone的新工作表,它只是空白......

import requests
from bs4 import BeautifulSoup
from openpyxl import Workbook
url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=Seattle%2C%20WA&page=4" # nav to webpage I want to scrape
r = requests.get(url)
soup = BeautifulSoup(r.content)

# create a dummy list of texts to write to excel file
divs = []

wb = Workbook() # open new workbook, use load_workbook if existing
ws = wb.create_sheet('Phone')
for div in divs:
    row = [div.contents[0].text, div.contents[1].text]  # construct a row: shown only for example purposes
    ws.append(row)          # could use ws.append(div) since each div is a list 

wb.save('RDWM_CRM.xlsx')     # save workbook, will overwrite if exists

任何帮助表示赞赏!!

1 个答案:

答案 0 :(得分:2)

如果我不完全理解您的问题,请提前道歉,但使用openpyxl似乎存在一些问题。

以下是如何使用openpyxl编写工作表的示例,可能会有所帮助:

from openpyxl import Workbook

# create a dummy list of texts to write to excel file
divs = [[chr(i)*8, chr(i+1)*8] for i in range(65, 75, 1)]

wb = Workbook()             # open new workbook, use load_workbook if existing
ws = wb.create_sheet(title="Example")
for div in divs:
    row = [div[0], div[1]]  # construct a row: shown only for example purposes
    ws.append(row)          # could use ws.append(div) since each div is a list 
wb.save('example.xlsx')     # save workbook, will overwrite if exists

虚拟列表div看起来像这样:

[['AAAAAAAA', 'BBBBBBBB'],
 ['BBBBBBBB', 'CCCCCCCC'],
 ['CCCCCCCC', 'DDDDDDDD'],
 ['DDDDDDDD', 'EEEEEEEE'],
 ['EEEEEEEE', 'FFFFFFFF'],
 ['FFFFFFFF', 'GGGGGGGG'],
 ['GGGGGGGG', 'HHHHHHHH'],
 ['HHHHHHHH', 'IIIIIIII'],
 ['IIIIIIII', 'JJJJJJJJ'],
 ['JJJJJJJJ', 'KKKKKKKK']]

excel文件&#39; example.xlsx&#39;有这个工作表&#39;示例&#39;:

   A        B
1  AAAAAAAA BBBBBBBB
2  BBBBBBBB CCCCCCCC
3  CCCCCCCC DDDDDDDD
4  DDDDDDDD EEEEEEEE
5  EEEEEEEE FFFFFFFF
6  FFFFFFFF GGGGGGGG
7  GGGGGGGG HHHHHHHH
8  HHHHHHHH IIIIIIII
9  IIIIIIII JJJJJJJJ
10 JJJJJJJJ KKKKKKKK

你会构造一个像这样的行:

row = [div.contents[0].text, div.contents[1].text]

假设div.contents是正确的。希望这可以帮助。 PS。我使用的是openpyxl版本2.3.0