根据多个标记文本查找父标记 - BeautifulSoup

时间:2016-12-19 07:31:04

标签: python xml beautifulsoup

根据多个标记文本查找父标记

考虑我在文件中有一部分xml,如下所示:

<Client name="Jack">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Jill">
        <Type>demo</Type>
        <Usage>limited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Ross">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

我正在使用BeautifulSoup来解析值。

这里我需要根据标签获取客户端名称。根据标签的文本,我需要获取客户端名称。(来自父标签)。

我的功能如下:

def get_client_for_usage(self, usage):
    """
    To get the client name for specified usage
    """
    usage_items = self.parser.findAll("client")
    client_for_usage = []
    for usages in usage_items:
        try:
            client_set = usages.find("usage", text=usage).findParent("client")
            client_attr = dict(client_set.attrs)
            client_name = client_attr[u'name']
            client_for_usage.append(client_name)

        except AttributeError:
            continue
    return client_for_usage

现在我需要获取客户端名称,但需要基于两个方面,即基于用法和类型。

所以我需要传递类型和用法,以便我可以获得客户端名称。

有人帮我一样。如果问题不明确,请告诉我,以便我可以根据需要进行编辑。

2 个答案:

答案 0 :(得分:1)

类似

def get_client_for_usage(self, usage, tpe):
    """
    To get the client name for specified usage
    """
    usage_items = self.parser.findAll("client")
    client_for_usage = []
    for usages in usage_items:
        try:
            client_set = usages.find("usage", text=usage).findParent("client")
            typ_node = usages.find("type", text=tpe).findParent("client")
            if client_set == typ_node:
                client_for_usage.append(client_set['name'])
        except AttributeError:
            continue
    return client_for_usage

答案 1 :(得分:0)

bitwise operators

出:

html = '''<Client name="Jack">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Jill">
        <Type>demo</Type>
        <Usage>limited</Usage>
        <Payment>online</Payment>
</Client>

<Client name="Ross">
        <Type>premium</Type>
        <Usage>unlimited</Usage>
        <Payment>online</Payment>
</Client>'''


import bs4 
import collections

soup = bs4.BeautifulSoup(html, 'lxml')
d = collections.defaultdict(list)
for client in soup('client'):
    type_, usage, payment = client.stripped_strings
    d[(type_, usage)].append(client['name'])

使用defaultdict(list, {('demo', 'limited'): ['Jill'], ('premium', 'unlimited'): ['Jack', 'Ross']}) type作为关键字,将客户usage作为值来构建name,而不是通过访问{dict获取name {1}}。