Question

我需要在给定的文本中使用python匹配这个正则表达式模式。

案文是：

  """

       2010 Toyota FJ Cruiser FJ CRUISER

       Int. Color:

       Ext. Color:

       Black

       Trans:

       Automatic

       VIN:

        JTEZU4BF7AK009445  


      Stock:

      122821B

      DIFFERENTIALBLACK

     Status:

     Body Style: 
     SUV
     Engine:
     Gas V6 4.0L/241
                                             Dealership: Universal Toyota



    $29,988*
                                             Price

     View More Information


     Compare?

    """

从这篇文章中我需要在vin之后提取“JTEZU4BF7AK009445”（长度为17）这个模式：可能

我使用了这种模式

        vin_pattern = re.compile('([A-Z0-9]{17})')
        vin = re.findall(vin_pattern,text)

        ["JTEZU4BF7AK009445","DIFFERENTIALBLACK"]

但 DIFFERENTIALBLACK 不应匹配

我也使用了模式

       price_pat = re.compile('(\$[0-9\,\.]+)')

匹配价格范围（“$”符号+值）

我需要在 VIN_PATTERN 的50个字符之前和之后检查此价格匹配模式。因为在某些情况下我有更多的价格值。所以，我需要在50个字符之前和该VIN模式的50个字符之后过滤文本

Plz应该怎么做？

Answer 1

让我们首先通过将所有空格转换为单个空格符号来简化文本：

t2 = re.sub(r'[\n\t\ ]+', ' ', t)  # t is your original text

这使得寻找VIN更容易的任务：

re.findall('[A-Z]{3}[A-Z0-9]{10}[0-9]{4}', t2)
Out[2]: ['JTEZU4BF7AK009445']

然后你可以获得VIN的位置：在你的字符串中并将vin_position - 50，vin_position + 50传递给.findall方法：

r2 = re.compile('(\$[0-9\,\.]+)')
r2.findall(t2, t2.find('VIN:') - 50, t2.find('VIN:') + 50)
Out[4]: []

在你的文本中，VIN的价格超过50个字符，即你需要扩展这个边界（100个工作正常）：

r2.findall(t2, t2.find('VIN:') - 100, t2.find('VIN:') + 100)
Out[5]: ['$29,988']

Answer 2

如果您不必使用正则表达式（它们在a **中很痛苦），我建议您使用以下解决方案：

yourstr = """ ... whatever ... """

lst = yourstr.split()
vin = lst[lst.index('VIN:') + 1]
price = [i for i in lst if '$' in i][0]

我希望这就足够了！

Answer 3

肮脏的黑客但它会起作用。

import re
st = "....your string...."
x = re.findall(r"VIN:([^Stock]+)",st)
y = "".join(x)
y.strip(" \n")
print y

output = 'JTEZU4BF7AK009445'

如何过滤字符串模式以匹配文本行与正则表达式？

3 个答案: