Question

我有一个带有销售记录的SQLite表 - 其中13个位于运费价格 - 基本上有3种可能性：

价格：例如£15.20 自由未指定

问题是并不总是只有这些词：对于前者。它可以说“运费是15.20英镑”或“免运费” - 我需要将其标准化为上述可能性。我使用RegEx：

def correct_shipping(db_data):
pattern=re.compile("\£(\d+.\d+)") #search for price
pattern_free=re.compile("free") #search for free shipping
pattern_not=re.compile("not specified") #search for shipping not specified 

for every_line in db_data:
    try:
        found=pattern.search(every_line[13].replace(',','')).group(1)
    except:
        try:
            found=pattern_free.search(every_line[13]).group()
        except:
            found=pattern_not.search(every_line[13]).group()

    if found:
        query="UPDATE MAINTABLE SET Shipping='"+found+"' WHERE Id="+str(every_line[0])
        db_cursor.execute(query)
db_connection.commit()

但是此代码引发了例外情况AttributeError: 'NoneType' object has no attribute 'group' - 形式为“5.20”的第一个结果触发它，因为没有找到任何模式问题是如何正确搜索字符串（是否需要try / except？）或如果没有找到任何字符串，如何忽略异常（虽然这不是很好的解决方案？）

Answer 1

第一个问题是您的代码无法正确处理故障。如果您想使用在不匹配时返回None的函数，您必须检查None，或处理尝试在其上调用AttributeError而产生的group

您可以在前两个下再加一层try / except。但这很难读懂。像这样的函数会简单得多：

match = pattern.search(every_line[13].replace(',',''))
if match:
    return match.group(1)
match = pattern_not.search(every_line[13])
if match:
    return match.group()
match = pattern_not.search(every_line[13])
if match:
    return match.group()

它使用与您的代码相同的正则表达式，但是没有尝试调用group的问题，无论每个匹配是否成功，所以它的工作原理很好而且很简单。

有一些方法可以进一步简化这一点。例如，您不需要使用regexp来搜索"free"等固定字符串;您可以使用str.find或str.index。或者，您可以使用单个正则表达式进行搜索，并使用三向交替，而不是进行三次单独搜索。

下一个问题是你的第一个模式是错误的。除了regexp特殊字符（或Python特殊字符......但你应该使用原始字符串，所以你不需要逃避它们），你不应该反斜杠 - 逃避任何东西，而英镑符号不是其中之一。

更重要的是，如果这是Python 2.x，你永远不应该将非ASCII字符放入字符串文字中;只将它们放在Unicode文字中。（并且仅当您为源文件指定编码时。）

Python的regexp引擎可以处理Unicode ...但是如果你给它mojibake，就像UTF-8磅符号解码为Latin-1之类的东西。（事实上，即使你得到所有的编码权利，最好给它Unicode模式和搜索字符串而不是编码的。否则，它无法知道它正在搜索Unicode，或者某些字符不仅仅是字节长等等。）

Answer 2

不要搜索英镑符号。搜索数字，然后自己手动添加英镑符号。

import re

strings = [
    "5.20",
    "$5.20",
    "$.50",
    "$5",
    "Shipping is free",
    "Shipping: not specified",
    "free",
    "not specified",
]

pattern = r"""
    \d*                     #A digit 0 or more times 
    [.]?                    #A dot, optional
    \d+                     #A digit, one or more times 
    | free                  #Or the word free
    | not \s+ specified     #Or the phrase "not specified"
"""

regex = re.compile(pattern, flags=re.X)
results = []

for string in strings:
    md = re.search(regex, string)

    if md:
        match = md.group()
        if re.search(r"\d", match):
            match = "$" + match
        results.append(match)
    else:
        print "Error--no match!"

print results

--output:--
['$5.20', '$5.20', '$.50', '$5', 'free', 'not specified', 'free', 'not specified']

搜索多个RegEx子字符串

2 个答案: