我试图通过使用beautifulsoup和selenium来抓取动态网站。我想要过滤并放入CSV的属性包含在< script>
标记中。我想提取
脚本:
 
 window.IS24 = window.IS24 || {};
 IS24.ssoAppName =“search”;
 IS24.applicationContext =“/ Suche / error-reporter”;
 IS24.ab = {};
 IS24.feature = {“SEARCH_BY_TELEKOM_SPEED_ENABLED”:true,
 IS24.resultList = {
 angularDebugInfoEnabled:false,
 navigationBarUrl:“/ Suche / ST / Haus-Kauf”,


 nextPage:“/ Suche / ST / P-2 / Haus-Kauf?pagerReporting = true ”,

 searchUrl:“/ Haus-Kauf”,
 isMobile:false,
 isTablet:false,
 query:
 {“realEstateType”:“HOUSE_BUY”,“otpEnabled”:true,“sortingCode”:0,“location”:
 {“isGeoHierarchySearch”:true,
 Schulze“,” referrer“:[”RESULT_LIST_GROUPED“],”** attributes“:[
 {”title“:”Kaufpreis“,”value“:”249.012,75€“},
 {”title“: “Wohnfläche”,“value”:“129,87m²”},{“title”:“Zimmer”,“value”:“4”},
 {“title”:“Grundstück”,“value” :“400m²”},“checkedAttributes”:[“Gäste - **



 我不知道如何提取属性最后变成了CSV。你可以帮我解释一下代码吗?

答案 0 :(得分:0)
以下是如何使用beautifulSoup从标记中提取属性值。
import urllib2
from bs4 import BeautifulSoup
req = urllib2.Request('http://website_to_grab_things_from.com')
response = urllib2.urlopen(req)
html = response.read()
soup = BeautifulSoup(html, "html.parser")
alltext = soup.getText()
#soup.findAll('TAGNAME', {'ATTR_NAME' :'ATTR_VALUE'})
result = soup.findAll('div', {'class' :'teaser-text'})