我是使用python进行Web抓取的初学者。我正在尝试用蟒蛇网刮 - 美丽的汤和硒。我的目标是获得突出显示的元素[在这种情况下,它是1200平方英尺和单独的事件]。这是我的HTML代码,
</div>
<section class='space-section'>
<table class='space-features'>
<tbody>
<tr>
<td>
<i class='icon-measuringtape'></i>
<p class='space-feature-name'>1,200 sqft</p>
</td>
<td class='disabled'>
<i class='icon-store'></i>
<p class='space-feature-name'>Retail</p>
</td>
<td class='disabled'>
<i class='icon-restaurant'></i>
<p class='space-feature-name'>Bar & Restaurant</p>
</td>
<td class=''>
<i class='icon-event'></i>
<p class='space-feature-name'>Event</p>
</td>
<td class='disabled'>
<i class='icon-share'></i>
<p class='space-feature-name'>Shop Share</p>
</td>
<td class='disabled'>
<i class='icon-star'></i>
<p class='space-feature-name'>Unique</p>
</td>
</tr>
</tbody>
</table>
</section>
我的网站的网址是这个 - https://www.appearhere.co.uk/spaces/north-kensington-upcycling-store-and-cafe
我希望我的输出打印声明是这样的,SQFT - 1200平方英尺,零售 - 否,酒吧和餐厅 - 不,活动 - 是的,商店分享 - 不,独特 - 没有。你能告诉我一个解决方案吗?这个?
答案 0 :(得分:0)
试试这个:
from bs4 import BeautifulSoup
import requests
html = requests.get('https://www.appearhere.co.uk/spaces/north-kensington-upcycling-store-and-cafe').content
bsObj = BeautifulSoup(html, 'lxml')
data = bsObj.findAll('p')
for item in data:
print item.get_text()
希望这有帮助
答案 1 :(得分:0)
试试这段代码:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
url= "https://www.appearhere.co.uk/spaces/north-kensington-upcycling-store-and-cafe"
driver.maximize_window()
driver.get(url)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
space=soup.find("table",{"class":"space-features"})
feature=space.find_all("p")
options=[y.text for y in feature]
disabled=space.find_all("td",{"class":"disabled"})
options1=[y.text.strip() for y in disabled]
for x in options:
if 'sqft' in x:
print "SQFT - ",x
elif x in options1:
print x , " - No"
else:
print x , " - Yes"
这将打印:
SQFT - 1,200 sqft
Retail - No
Bar & Restaurant - No
Event - Yes
Shop Share - No
Unique - No
希望这是你想要的