通过自动化选择Python BeautifulSoup中的下一个亲戚

时间:2014-04-11 13:52:21

标签: python beautifulsoup

首先 - 我正在使用python BeautifulSoup创建xml文档。

目前,我正在尝试创建的内容与此示例非常相似。

<options>
    <opt name='string'>ContentString</opt>
    <opt name='string'>ContentString</opt>
    <opt name='string'>ContentString</opt>
</options>

请注意,应该只有一个名为name的标记。

由于选项可以更多,而且不同,我决定创建一个小python函数,这可以帮助我创建这样的结果。

array = ['FirstName','SecondName','ThirdName'] 
# This list will be guideline for function to let it know, how much options will be in result, and how option tags will be called.

def create_options(array):
    soup.append(soup.new_tag('options'))
    if len(array) > 0: # It's small error handling, so you could see, if given array isn't empty by any reason. Optional.
        for i in range(len(array)):
            soup.options.append(soup.new_tag('opt'))
            # With beatifullsoup methods, we create opt tags inside options tag. Exact amount as in parsed array.
        counter = 0
        # There's option to use python range() method, but for testing purposes, current approach is sufficient enough.
        for tag in soup.options.find_all():
            soup.options.find('opt')['name'] = str(array[counter])
            # Notice, that in this part tag name is assigned only to first opt element. We'll discuss this next.
            counter += 1
        print len(array), ' options were created.'
    else:
        print 'No options were created.'

您注意到,在函数中,标记赋值由for循环处理,遗憾的是,它将所有不同的标记名称分配给options元素中的第一个选项。

BeautifulSoup有.next_sibling.previous_sibling,可以帮助我完成这项任务。 正如他们通过名称描述的那样,我可以访问元素中的下一个或上一个兄弟。所以,通过这个例子:

soup.options.find('opt').next_sibling['name'] = str(array[counter])

我们可以访问options元素的第二个子元素。因此,如果我们将.next_sibling添加到每个soup.items.find('opt'),我们就可以从第一个元素移动到下一个元素。 问题是,通过在选项中找到选项元素:

soup.options.find('opt')

每次我们访问第一个选项。但我的功能是愿意访问列表中的每个项目,也是下一个选项。所以这意味着,随着列表中的项目越多,它必须添加到第一个选项的更多.next_sibling方法。

结果,我构建了逻辑,列表中有第4个或更多项,访问相关选项以分配它的相应标签,应如下所示:

soup.options.find('opt').next_sibling.next_sibling.next_sibling.next_sibling['name'] = str(array[counter])

现在我们准备回答我的问题了:

第一。由于我没有找到任何其他类型的方法,如何使用Python BeautifulSoup方法,我不确定,我的方法仍然是唯一的方法。还有其他方法吗?

位2st。如果我的实验告诉我,我不能将变量放在方法行中,我怎么能通过这种方法获得结果呢? (所以我可以增加方法)

#Like this
thirdoption = .next_sibling.next_sibling.next_sibling
#As well, it's not quite possible, but it's just example.
soup.options.find('opt').next_sibling.next_sibling.next_sibling['name'] = str(array[counter])

3ST。可能是我读了很好的BeautifulSoup文档,只是没找到方法,这可以帮助我完成这项任务?

1 个答案:

答案 0 :(得分:0)

我设法达到了结果,忽略了BeatifulSoup方法。 Python有元素树方法,足以使用。

所以,让我展示示例代码,并解释它,它的作用。评论更精确地提供了解释。

"""
Before this code, there goes soup xml document generation. Except part, I mentioned in topic, we just create empty options tags in document, thus, creating almost done document.
Right after that, with this little script, we will use basic python provided element tree methods.
"""

import xml.etree.ElementTree as ET

ET_tree = ET.parse("exported_file.xml")
# Here we import exactly the same file, we opened with soup. Exporting can be done in different file, if you wish.
ET_root = ET_tree.getroot()

for position, opt in enumerate(item.find('options')):
# Position is pretty important, as this will remove 'counter' thing in for loop, I was using in code in first example. Position will be used for getting out exact items from array, which works like template for our option tag names.
    opt.set('name', str(array[position]))
    opt.text = 'text'
# Same way, with position, we can get data from relevant array, provided, that they are inherited or connected in same way.

tree = ET.ElementTree(ET_root).write('exported_file.xml',encoding="UTF-8",xml_declaration=True)
# This part was something, I researched before quite lot. This code will help save xml document with utf-8 encoding, which is very important.

这种方法效率很低,为了达到同样的效果,我可以将ET用于所有事情。 想想,BeatifulSoup以精美的输出准备文档,它以任何方式非常整洁,因为元素树创建的文件仅供软件友好使用。