尝试构建一个脚本,从Yelp中删除相应评论中的文本和星号,并将数据存储在Excel文件中。
我正在使用的HTML代码片段如下:
<div class="review-content">
<div class="biz-rating biz-rating-large clearfix">
<div>
<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star
rating">
<img alt="5.0 star rating" class="offscreen" height="303"
src="https://s3-media1.fl.yelpcdn.com/assets/srv0/yelp_design_web/41341496d9db/assets/img/stars/stars.png" width="84"/>
</div>
</div>
<span class="rating-qualifier">
5/10/2017
</span>
</div>
<p lang="en">This place is really fun and cute. I was happy to discover
it.. <br/><br/>They also have beer and wine here, which is kind of a
nice bonus. The sangria is good..</p>
</div>
我的python代码如下:
import requests
from bs4 import BeautifulSoup as soup
import xlsxwriter
#Index for xlsxwriter
row = 1
i = 0
#Index for all of the review-containing pages for one restaurant.
page_num = 0
#Call xlsxwriter and name the output file.
workbook = xlsxwriter.Workbook('file_1.xlsx')
worksheet = workbook.add_worksheet()
#Write in the header for the file
worksheet.write('A1','num_stars')
worksheet.write('B1', 'review_text')
#Loop to scrape all of the reviews off of one single page with a
specific url and advance the to all subsequent pages of the restaurant.
while page_num <= 260:
url = "https://www.yelp.com/biz/monkey-house-cafe-huntington-beach?
start=%s" % page_num
r = requests.get(url)
page_soup = soup(r.content, "lxml")
review_container = page_soup.findAll("div", {"class": "review-
content"})
for review in review_container:
string = str(review.p.text)
stars = float(review[i].select('img')[0]['alt'].split()[0])
worksheet.write(row, 0, stars)
worksheet.write(row, 1, string)
row += 1
i += 1
#Advance counter in order to scrape the next url for the restaurant
page_num += 20
workbook.close()
运行此脚本时出现的问题是我收到以下错误:
-----------------------------------------------------------------------
----
KeyError Traceback (most recent call last)
<ipython-input-18-a73ecb4ef119> in <module>()
38 for review in review_container:
39 string = str(review.p.text)
---> 40 stars = float(review[i].select('img')[0]['alt'].split()[0])
41 worksheet.write(row, 0, stars)
42 worksheet.write(row, 1, string)
//anaconda/lib/python3.5/site-packages/bs4/element.py in
__getitem__(self, key)
956 """tag[key] returns the value of the 'key' attribute for the tag,
957 and throws an exception if it's not there."""
--> 958 return self.attrs[key]
959
960 def __iter__(self):
KeyError: 0
我理解导致代码的行是下面的代码:
stars = float(review[i].select('img')[0]['alt'].split()[0])
但是,我不太了解如何纠正错误以使脚本正常工作。
为了让脚本正常工作,我需要在代码中进行哪些更改?
答案 0 :(得分:0)
我相信它应该只是
ImportError: libcudart.so.7.5: cannot open shared object file: No
such file or directory