我正在尝试打开一个 XML 文件并对其进行解析,查看其标签并查找每个特定标签中的文本。如果标签中的文本与字符串匹配,我希望它删除字符串的一部分或用其他内容替换它。
但是,由于某种原因,我的“if 语句”似乎不起作用。我希望它仅在变量“action”等于“remove”时才执行某些操作,并且仅在变量“action”等于“substitute”时才执行其他操作。但是,当“action”等于“substitute”时,if 语句会执行elif 语句中的内容也是如此。此外,第二个 if 语句中的 if、elif 和 else 语句似乎也不起作用。即使 end_int 不等于 none,if 语句中的内容也会发生但不会继续elif 和 else 语句在“start_int == None”和其余情况下。
mfn_pn 变量是用户输入的条码,类似于 ATL-157-1815、DFW-184-8378.、ATL-324-3243.、DFW-432-2343。
XML 文件包含以下数据:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW.+\.$</regex >
<start_char>3</start_char>
<end_char>-1</end_char>
<action>remove</action>
</filter>
<filter>
<regex>\-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>\s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
我使用的 Python 代码是:
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filters.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start_prim = x.find('start_char')
# If the element exists assign its text to start variable
start = start_prim.text if start_prim is not None else None
start_int = int(start) if start is not None else None
# Find the text inside the end_char tag
end_prim = x.find('end_char')
# If the element exists assign its text end variable
end = end_prim.text if end_prim is not None else None
end_int = int(end) if end is not None else None
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.match(r'%s' % regex, mfn_pn, re.IGNORECASE):
if end_int == None:
mfn_pn = mfn_pn[start_int:]
elif start_int == None:
mfn_pn = mfn_pn[:end_int]
else:
mfn_pn = mfn_pn[start_int:end_int]
elif action == 'substitute':
mfn_pn = re.sub(r'%s' % regex, '', mfn_pn)
输出:
如果 mfn_pn = 1PDFW 356-5789,我得到 FW3565789。它删除前 3 个字符,即使它应该查看 xml 文件并且当 regex 等于 1P 时,也只删除前两个字符,因为 start_char 等于 2。所以 mfn_pn = regex[start_int:] 应该是 mfn_pn = regex[ 2:],但出于某种原因,它仍然认为 start_int 是 3。
如果 mfn_pn = DFW 356-5789,我得到 3565789。它正在删除前三个字符,即使正则表达式与任何应该删除的字符都不匹配 - 它执行 if 语句,即使它应该跳过到 elif 语句。
它似乎只获取第一个“过滤器”标签中的内容,并将正则表达式设置为仅与第一个正则表达式标签中的内容相等,start_int 仅等于第一个 start_int 中的内容,而 end_char 仅等于第一个 end_int 中的内容.在 if 语句中,它不会将正则表达式设置为与其余过滤器标签中的内容相同。
答案 0 :(得分:0)
基于您想要的 1PDFW 356-5789 输出,它将产生 3565789。如果可以更改正则表达式,我对 filters.xml 和 python 代码有如下建议
XML 文件包含以下数据:
<?xml version="1.0" encoding="utf-8"?>
<metadata>
<filter>
<regex>ATL|LAX|DFW</regex >
<start_char>2</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>DFW</regex >
<start_char>3</start_char>
<end_char></end_char>
<action>remove</action>
</filter>
<filter>
<regex>\-</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex>\s</regex >
<replacement></replacement>
<action>substitute</action>
</filter>
<filter>
<regex> T&R$</regex >
<start_char></start_char>
<end_char>-4</end_char>
<action>remove</action>
</filter>
</metadata>
我使用的 Python 代码是:
import re
from xml.etree.ElementTree import ElementTree
# filters.xml is the file that holds the things to be filtered
tree = ElementTree()
tree.parse("filter.xml")
# Get the data in the XML file
root = tree.getroot()
# Loop through filters
for x in root.findall('filter'):
# Find the text inside the regex tag
regex = x.find('regex').text
# Find the text inside the start_char tag
start_prim = x.find('start_char')
# If the element exists assign its text to start variable
start = start_prim.text if start_prim is not None else None
start_int = int(start) if start is not None else None
# Find the text inside the end_char tag
end_prim = x.find('end_char')
# If the element exists assign its text end variable
end = end_prim.text if end_prim is not None else None
end_int = int(end) if end is not None else None
# Find the text inside the action tag
action = x.find('action').text
if action == 'remove':
if re.search(r'%s\b' % regex,mfn_pn):
mfn_pn = mfn_pn[start_int:end_int]
elif action == 'substitute':
mfn_pn = re.sub(r'%s' % regex, '', mfn_pn)