Question

我正在使用LXML查询包含各种产品上的数据元素的多个XML文件。此部分代码将获取缺少的product_id的列表，并在XML文件中查询产品的数据元素。

我的核心问题之一是，通过xpath获得的每个 product_id 都会与列表 products_missing_from_postgresql 中的每个项目进行检查，这可能要花很多时间（小时）

找到匹配项后，如何重新启动条目中的条目循环？

也许这不是正确的问题...如果不是正确的问题是什么？

# this code is for testing purposes 
for product_number in products_missing_from_postgresql:
try:
    for entry in entries:

       product_id = entry.xpath('@id')[0]

        if product_id != product_number:

            print('************************')
            print('current product: ' + product_id)
            print('no match: ' + product_number)
            print('************************')

        else:

            print('************************')
            print('************************')
            print('product to match: ' + product_number)
            print('matched from entry: ' + product_id)
            print('************************')
            print('************************')

测试代码输出：

************************
************************
product to match: B3F2H-STH 
matched from entry: B3F2H-STH 
************************
************************

************************
current product: B3F2H-STL
no match: B3F2H-STH 
************************

************************
current product: B3F2H-004 
no match: B3F2H-STH 
************************

此代码用于生产：

for product_number in products_missing_from_postgresql:

try:
for entry in entries:

    product_id = entry.xpath('@id')[0]

    if product_id != product_number:

        # used for testing
        print('no match: ' + product_number)

    else:
       # the element @id has multiple items linked that I need to acquire. 

       product_id = entry.xpath('@id')[0]
       missing_products_to_add.append(product_id)

       product_name = entry.xpath('@name')[0]
       missing_products_to_add.append(product_name)

       product_type = entry.xpath('@type')[0]
       missing_products_to_add.append(product_type)

       product_price = entry.xpath('@price')[0]
       missing_products_to_add.append(product_price)

Answer 1

尝试将您的ID放入set中，并与之进行一次比较-这将保存嵌套循环，并且只对XPath执行一次，而不是继续查询树...

ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
    if product_number in ids:
        # whatever
    else:
        # whatever

如果您还想检索元素，则可以构建字典而不是集合：

products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
    if product_number in products:
        actual_product = products[product_number]
        # ...
    else:
        # ...

Answer 2

使用XPath代替内部for循环。

for product_number in products_missing_from_postgresql:
    entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
    if entries:
        print('FOUND: ' + product_number)
    else:
        print('NOT FOUND: ' + product_number)

如果您的product_number可以包含单引号，则以上内容将无效。通常最好在XPath中使用占位符并分别传递实际值：

    entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)

匹配时的python lxml循环获取下一个条目

2 个答案: