我想从数据<meta itemprop="url" content="http://www.vestiairecollective.com/women-bags/handbags/chanel/black-timeless-leather-handbag-chanel-2668779.shtml">
中仅删除内容,即只扫描http部分。但是我这样做的方式,从“meta”开始,我得到了整个数据。
这是我的脚本逻辑: -
import urllib.request
from bs4 import BeautifulSoup
url=urllib.request.urlopen("http://www.vestiairecollective.com/women-bags/handbags/#_=catalog")
soup=BeautifulSoup(url.read(),"html.parser")
getdata=soup.find_all("div",{"class":"expand-snippet-container"})
for i in getdata:
data1=i.find_all("meta",{"itemprop":"url"})
datac=[da[0] for da in data1]
print(datac1)
for i in getdata:
data1=i.find_all("p",{"class":"brand"})
datac1=[da.contents[0] for da in data1]
brdata=("\n".join(datac1))
if brdata=="CHANEL":
da1=i.find_all("meta",{"itemprop":"url"})
print(da1)
在上一个打印语句中,我只需要显示的网址(示例http://www.vestiairecollective.com/women-bags/handbags/chanel/black-timeless-leather-handbag-chanel-2668779.shtml
。我做错了什么?请帮忙。