从div标签提取数据

时间:2019-12-06 10:00:08

标签: beautifulsoup python-requests

因此我从网站上收集数据,并且div标签中包含一些数据 像这样:

<div class="search-result__title">\nDonald Duck <span>\xa0|\xa0</span>\n<span class="city state" data-city="city, TX;city, TX;city, TX;city, TX" data-state="TX"><a href="https://example.com/" rel="nofollow">STATENAME, CITYNAME</a>\n</span>\n</div>,

我想在rel =“ nofollow”之后抓取“ Donald Duck”部分以及州和城市名称 该站点包含大量数据,因此名称和状态不同

我写的代码是

div = soup.find_all('div', {'class':'search-result__title'})
print (div.string)

这给我一个错误

    "ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key

1 个答案:

答案 0 :(得分:0)

首先,使用SimpleDateFormat。其次,SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss"); sdf.setTimeZone(TimeZone.getTimeZone("UTC")); Calendar calendar = Calendar.getInstance(); try { calendar.setTime(sdf.parse("2019-12-06T06:04:50.022461Z")); } catch (Exception ex) { ex.printStackTrace(); } SimpleDateFormat returnFormat = new SimpleDateFormat("dd-mm-yyyy hh:mm:ss"); returnFormat.format(calendar.getTime()); 将返回元素列表。您需要使用以下任一参数来指定索引值:.text,或者由于您可能会有多个元素,因此只需遍历它们即可。

find_all()