Question

<meta itemprop="streetAddress" content="4103 Beach Bluff Rd">

我必须得到内容＆＃39; 4103 Beach Bluff Rd＆＃39;。我试图用BeautifulSoup完成这项工作，所以，我试试这个：

soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')

soup.find(itemprop="streetAddress").get_text()

但是我得到了一个empy字符串作为结果，这可能有意义，因为当打印汤对象时

print soup

我明白了：

<html><head><meta content="4103 Beach Bluff Rd" itemprop="streetAddress"/> </head></html>

显然，我想要的数据是在元内容中。标签，我该如何获得这些数据？

Answer 1

soup.find(itemprop="streetAddress").get_text()

您正在获取匹配元素的文本。相反，获取“内容”属性值：

soup.find(itemprop="streetAddress").get("content")

这是可能的，因为BeautifulSoup提供了dictionary-like interface to tag attributes：

您可以通过将标记视为字典来访问标记的属性。

演示：

>>> from bs4 import BeautifulSoup
>>>
>>> soup = BeautifulSoup('<meta itemprop="streetAddress" content="4103 Beach Bluff Rd"> ')
>>> soup.find(itemprop="streetAddress").get_text()
u''
>>> soup.find(itemprop="streetAddress").get("content")
'4103 Beach Bluff Rd'

美丽的汤元内容标记

1 个答案: