匹配时的python lxml循环获取下一个条目

时间:2018-07-29 14:08:41

标签: python list lxml

我正在使用LXML查询包含各种产品上的数据元素的多个XML文件。此部分代码将获取缺少的product_id的列表,并在XML文件中查询产品的数据元素。

我的核心问题之一是,通过xpath获得的每个 product_id 都会与列表 products_missing_from_postgresql 中的每个项目进行检查,这可能要花很多时间(小时)

找到匹配项后,如何重新启动条目中的条目循环?

也许这不是正确的问题...如果不是正确的问题是什么?

# this code is for testing purposes 
for product_number in products_missing_from_postgresql:
try:
    for entry in entries:

       product_id = entry.xpath('@id')[0]

        if product_id != product_number:

            print('************************')
            print('current product: ' + product_id)
            print('no match: ' + product_number)
            print('************************')

        else:

            print('************************')
            print('************************')
            print('product to match: ' + product_number)
            print('matched from entry: ' + product_id)
            print('************************')
            print('************************')

测试代码输出:

************************
************************
product to match: B3F2H-STH 
matched from entry: B3F2H-STH 
************************
************************

************************
current product: B3F2H-STL
no match: B3F2H-STH 
************************

************************
current product: B3F2H-004 
no match: B3F2H-STH 
************************

此代码用于生产:

for product_number in products_missing_from_postgresql:

try:
for entry in entries:

    product_id = entry.xpath('@id')[0]

    if product_id != product_number:

        # used for testing
        print('no match: ' + product_number)

    else:
       # the element @id has multiple items linked that I need to acquire. 

       product_id = entry.xpath('@id')[0]
       missing_products_to_add.append(product_id)

       product_name = entry.xpath('@name')[0]
       missing_products_to_add.append(product_name)

       product_type = entry.xpath('@type')[0]
       missing_products_to_add.append(product_type)

       product_price = entry.xpath('@price')[0]
       missing_products_to_add.append(product_price)

2 个答案:

答案 0 :(得分:1)

尝试将您的ID放入set中,并与之进行一次比较-这将保存嵌套循环,并且只对XPath执行一次,而不是继续查询树...

ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
    if product_number in ids:
        # whatever
    else:
        # whatever

如果您还想检索元素,则可以构建字典而不是集合:

products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
    if product_number in products:
        actual_product = products[product_number]
        # ...
    else:
        # ...

答案 1 :(得分:0)

使用XPath代替内部for循环。

for product_number in products_missing_from_postgresql:
    entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
    if entries:
        print('FOUND: ' + product_number)
    else:
        print('NOT FOUND: ' + product_number)

如果您的product_number可以包含单引号,则以上内容将无效。通常最好在XPath中使用占位符并分别传递实际值:

    entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)