我正在使用LXML查询包含各种产品上的数据元素的多个XML文件。此部分代码将获取缺少的product_id的列表,并在XML文件中查询产品的数据元素。
我的核心问题之一是,通过xpath获得的每个 product_id 都会与列表 products_missing_from_postgresql 中的每个项目进行检查,这可能要花很多时间(小时)
找到匹配项后,如何重新启动条目中的条目循环?
也许这不是正确的问题...如果不是正确的问题是什么?
# this code is for testing purposes
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
print('************************')
print('current product: ' + product_id)
print('no match: ' + product_number)
print('************************')
else:
print('************************')
print('************************')
print('product to match: ' + product_number)
print('matched from entry: ' + product_id)
print('************************')
print('************************')
测试代码输出:
************************
************************
product to match: B3F2H-STH
matched from entry: B3F2H-STH
************************
************************
************************
current product: B3F2H-STL
no match: B3F2H-STH
************************
************************
current product: B3F2H-004
no match: B3F2H-STH
************************
此代码用于生产:
for product_number in products_missing_from_postgresql:
try:
for entry in entries:
product_id = entry.xpath('@id')[0]
if product_id != product_number:
# used for testing
print('no match: ' + product_number)
else:
# the element @id has multiple items linked that I need to acquire.
product_id = entry.xpath('@id')[0]
missing_products_to_add.append(product_id)
product_name = entry.xpath('@name')[0]
missing_products_to_add.append(product_name)
product_type = entry.xpath('@type')[0]
missing_products_to_add.append(product_type)
product_price = entry.xpath('@price')[0]
missing_products_to_add.append(product_price)
答案 0 :(得分:1)
尝试将您的ID放入set
中,并与之进行一次比较-这将保存嵌套循环,并且只对XPath执行一次,而不是继续查询树...
ids = {pid for entry in entries for pid in entry.xpath('@id')}
for product_number in products_missing_from_postgresql:
if product_number in ids:
# whatever
else:
# whatever
如果您还想检索元素,则可以构建字典而不是集合:
products = {p.attrib['id']: p for entry in entries for p in entry.xpath('//*[@id]')}
for product_number in products_missing_from_postgresql:
if product_number in products:
actual_product = products[product_number]
# ...
else:
# ...
答案 1 :(得分:0)
使用XPath代替内部for
循环。
for product_number in products_missing_from_postgresql:
entries = xml_tree.xpath("//entry[@id = '%s']" % product_number)
if entries:
print('FOUND: ' + product_number)
else:
print('NOT FOUND: ' + product_number)
如果您的product_number
可以包含单引号,则以上内容将无效。通常最好在XPath中使用占位符并分别传递实际值:
entries = xml_tree.xpath("//entry[@id = $value]", value=product_number)