Question

给出一个看起来像这样的文本文件：

Samsung Galaxy S6 active SM-G890A 32GB Camo White (AT&T) *AS-IS* Cracked Screen
Samsung Galaxy S6 SM-G920 - 32GB - White Verizon Cracked screen
Samsung Galaxy S6 edge as is cracked screen

我尝试过多种不同的方法让字符串Samsung Galaxy S6与Samsung Galaxy S6 edge不匹配，但似乎无法想出一种有效的方法。在字符串中没有任何意义，它清楚地表明电话的名称已经结束并且无关的信息开始了，所以将它们分开并与字典或类似的东西进行比较。工作。

我试着想办法写下面的内容：

phones = ['Samsung Galaxy S6', 'Samsung Galaxy S6 Edge']
lines = open('phones.txt', 'r').readlines()
for line in lines:
    for phone in phones:
        if phone in line and no other phone in phones is in line:
            print('match found')

但我无法想出构建它的正确方法 - 任何人都有任何想法？我确定我在这里遗漏了一些简单的东西，但却无法弄清楚是什么。

Answer 1

首先对手机进行分类，使其按长度查看

phones.sort(key=len,reverse=True)

然后在找到匹配时中断

for phone in phones:
   if phone in line:
      print "FOUND:",repr(phone),"IN",repr(line)
      break # we dont need to keep looking for other phones in this line

可能？

这样“三星Galaxy s6 Edge”在您的支票中出现在“Samsung Galaxy”之前，您将匹配最长的...而不需要像正则表达式答案那样了解您的手机列表

Answer 2

负向前瞻会做：

setState()

请参阅a demo on regex101.com。

Answer 3

if sum(1 for phone in phones if phone in line) == 1:

这实际上会计算phones的成员，这些成员也是line的成员。然后我们检查以确保数字是一个。

如何让我的Python字符串非贪婪地匹配？

3 个答案: