Python Beautifulsoup-从标签和紧接其下方的标签中获取文本

时间:2016-12-10 02:35:49

标签: python xml xml-parsing beautifulsoup

我有一个长文件,可以重复使用标签。我需要来自任意数量的两个标签类型的文本(尽管我不需要来自该类型的每个标签的文本)。

以下是xml文件的片段:

<key>category</key>
<string>Utilities</string>
<key>description</key>
<string></string>
<key>developer</key>
<string></string>
<key>display_name</key>
<string>PaperCut Client</string>
<key>icon_hash</key>
<string>0db77f1181a63838123e5b25607be0b9b7e32432d11ec3f370ddde1a7807f3fc</string>
<key>installer_item_hash</key>
<string>ebe1f3093bf20f0c6524e79005b37f932dcfe0166a0d740d985450e7a55f9ca0</string>
<key>installer_item_location</key>
<string>PCClient-13.5.dmg</string>
<key>installer_item_size</key>
<integer>45941</integer>
<key>installer_type</key>
<string>copy_from_dmg</string>
<key>installs</key>

我需要提取的是密钥标签的文本,然后是紧随其后的字符串标记:

<key>'identifier'</key>
<string>'desired text'</string>

我可以使用以下命令返回所有display_name标记:

soup.findAll('key', string="display_name")

但是这会返回标签字符串'display_name'。我只需要'display_name'和来自以下标签的文本(来自'string'标签的文本,例如'PaperCut Client')。我怎么能做到这一点?

2 个答案:

答案 0 :(得分:0)

如果keystring总是成对出现并保持相同的顺序(我认为应该是这样,或者整个xml文件最终会混乱),你可以这样做:< / p>

for key_tag, string_tag in zip(soup.find_all('key'), soup.find_all('string')):
    print key_tag.text, string_tag.text

答案 1 :(得分:0)

public function store(CreateArticleRequest $request) {
    foreach (['favicon', 'title', 'image-optimization'] as $box) {
        $request($box) = $request->has($box);
    }
    Article::create($request->all());
    return redirect('articles');
}

出:

xml = '''
<key>category</key>
<string>Utilities</string>
<key>description</key>
<string></string>
<key>developer</key>
<string></string>
<key>display_name</key>
<string>PaperCut Client</string>
<key>icon_hash</key>
<string>0db77f1181a63838123e5b25607be0b9b7e32432d11ec3f370ddde1a7807f3fc</string>
<key>installer_item_hash</key>
<string>ebe1f3093bf20f0c6524e79005b37f932dcfe0166a0d740d985450e7a55f9ca0</string>
<key>installer_item_location</key>
<string>PCClient-13.5.dmg</string>
<key>installer_item_size</key>
<integer>45941</integer>
<key>installer_type</key>
<string>copy_from_dmg</string>
<key>installs</key>'''
soup = BeautifulSoup(xml, 'lxml')
keys = soup.find_all('key', string='display_name')
for key in keys:
    string = key.next_sibling.next_sibling
    print(key.text)
    print(string.text)