Question

我想在一些文本中搜索一组字符串第一次出现的索引（例如“->”或“ --x”或“ --XX”），我需要知道找到的字符串的开始位置，以及找到的特定字符串（更确切地说，是所标识的字符串的长度）

这是我到目前为止所拥有的..但是还不够。请帮忙。

arrowlist = {"->x","->","->>","-\","\\-","//--","->o","o\\--","<->","<->o"}
def cxn(line,arrowlist):
   if any(x in line for x in arrowlist):
      print("found an arrow {} at position {}".format(line.find(arrowlist),2))
   else:
      return 0

也许正则表达式会更容易，但是我真的很努力，因为箭头列表可能是动态的，箭头字符串的长度也可能是可变的。

谢谢！

Answer 1

我喜欢这个解决方案，灵感来自于这篇文章：

How to use re match objects in a list comprehension

import re

arrowlist = ["xxx->x", "->", "->>", "-\"","\\-"," // --","x->o", "-> ->"]

lines = ["xxx->x->->", "-> ->", "xxx->x", "xxxx->o"]

def filterPick(list,filter):
    return [(m.group(), item_number, m.start()) for item_number,l in enumerate(list) for m in (filter(l),) if m]


if __name__ == '__main__':

    searchRegex = re.compile(r''+ '|'.join(arrowlist) ).search
    x = filterPick(lines, searchRegex)
    print(x)

输出显示：

[('xxx->x', 0, 0), ('->', 1, 0), ('xxx->x', 2, 0), ('x->o', 3, 3)]

第一个数字是列表索引，第二个是字符串的起始索引。

Answer 2

按照您的示例逻辑，此方法跃出为查找“第一个”匹配箭头并打印其位置的最便捷方法。但是，集合的顺序不是FIFO，因此，如果要保留顺序，我建议将列表而不是集合替换为箭头列表，以便可以保留顺序。

    arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}
    def cxn(line, arrowlist):
       try:
           result = tuple((x, line.find(x)) for x in arrowlist if x in line)[0]
           print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

       # Remember in general it's not a great idea to use an exception as
       # broad as Exception, this is just for example purposes.
       except Exception:
          return 0

如果您要在提供的字符串（行）中查找第一个匹配项，则可以这样做：

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then shortest length 
       # to account for multiple arrow matches (i.e. -> and ->x)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1],len(r[0])))[0]
       # if you would like to match the "most complete" (i.e. longest-length) word first use:
       # result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1], -len(r[0])))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

或者，如果您可以访问标准库，则可以使用operator.itemgetter来达到几乎相同的效果，并通过减少函数调用来提高效率：

from operator import itemgetter

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then alphanumerically 
       # on the arrow match (i.e. -> and ->x matched in same position
       # will return -> because when sorted alphanumerically it is first)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=(itemgetter(1,0)))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

***注意：我使用的箭头列表与您的示例略有不同，只是因为您提供的箭头列表似乎与默认代码格式混淆（可能是因为引号关闭问题）。请记住，您可以像这样在字符串前加上'r'：r"Text that can use special symbols like the escape \and\ be read in as a 'raw' string literal\"。 See this question，以获取有关原始字符串文字的更多信息。

Answer 3

您可以做类似的事情

count = 0
for item in arrowlist:
    count += 1
    if item in line:
        print("found an arrow {} at position {}".format(item,count))

Answer 4

想要发布我想出的答案（来自反馈的结合）如您所见，此结果-确实很冗长且效率很低，它将返回在正确位置索引处找到的正确箭头字符串。 -

arrowlist = ["xxx->x", "->", "->>", "xxx->x","x->o", "xxx->"]
doc =""" @startuml
    n1 xxx->xx n2 : should not find
    n1 ->> n2 : must get the third arrow
    n2  xxx-> n3 : last item
    n3   -> n4 : second item
    n4    ->> n1 : third item"""

def checkForArrow(arrows,line):
    for a in arrows:
        words = line.split(' ')
        for word in words:
            if word == a:
                return(arrows.index(a),word,line.index(word))

for line in iter(doc.splitlines()):
    line = line.strip()
    if line != "":
        print (checkForArrow(arrowlist,line))

返回以下结果：（箭头列表中项目的索引，找到的字符串，文本在行中的索引位置）

None
None
(2, '->>', 3)
(5, 'xxx->', 4)
(1, '->', 5)
(2, '->>', 6)

Python-查找字符串中首次出现的字符串列表的索引位置

4 个答案: