如何根据条件python在字符串中查找字母的位置

时间:2017-04-14 20:18:16

标签: python string list

我想在字符串中找到满足某个条件的字母索引。如果字母前面的所有括号都完整,我想找到字母g的索引。

这就是我所拥有的

sen = 'abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'

这就是我所做的

lst = [(i.end()) for i in re.finditer('g', sen)]
# lst
# [7, 16, 20, 29, 32, 36, 40]
count_open = 0
count_close = 0
for i in lst:
    sent=sen[0:i]
    for w in sent:
        if w == '(':
            count_open += 1
        if w == ')':
            count_close += 1    
        if count_open == count_close && count_open != 0:
            c = i-1
     break

它给了我c作为39,这是最后一个索引,但正确的答案应该是35作为第二个最后一个g之前的括号。

5 个答案:

答案 0 :(得分:3)

你可以省去regex,只需使用一个堆栈来跟踪你的parens是否在你迭代角色时是否平衡:

In [4]: def find_balanced_gs(sen):
   ...:     stack = []
   ...:     for i, c in enumerate(sen):
   ...:         if c == "(":
   ...:             stack.append(c)
   ...:         elif c == ")":
   ...:             stack.pop()
   ...:         elif c == 'g':
   ...:             if len(stack) == 0:
   ...:                 yield i
   ...:

In [5]: list(find_balanced_gs(sen))
Out[5]: [31, 35, 39]

在这里使用堆栈是"经典"检查平衡的parans的方式。自从我从头开始实施它已经有一段时间了,所以可能会有一些我没有考虑过的边缘情况。但这应该是一个好的开始。我已经创建了一个生成器,但你可以使它成为一个正常的函数,它返回一个索引列表,第一个索引或最后一个索引。

答案 1 :(得分:1)

保持你的想法,只有一些事情没有,请看评论:

import re

sen='abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'


lst=[ (i.end()) for i in re.finditer('g', sen)]
#lst
#[7, 16, 20, 29, 32, 36, 40]

for i in lst:
    # You have to reset the count for every i
    count_open= 0
    count_close=0
    sent=sen[0:i]
    for w in sent:
        if w=='(':
            count_open+=1
        if w==')':
            count_close+=1    
    # And iterate over all of sent before comparing the counts
    if count_open == count_close & count_open != 0:
        c=i-1
        break
print(c)
# 31 - actually the right answer, not 35

但是这不是很有效,因为你在字符串的相同部分上多次迭代。你可以使它更高效,只在字符串上迭代一次:

sen='abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'

def find(letter, string):
    count_open = 0
    count_close = 0
    for (index, char) in enumerate(sen):
        if char == '(':
            count_open += 1
        elif char == ')':
            count_close += 1
        elif char == letter and count_close == count_open and count_open > 0:
            return index
    else:
        raise ValueError('letter not found')

find('g', sen)
# 31
find('a', sen)
# ...
# ValueError: letter not found

答案 2 :(得分:1)

@Thierry Lathuille的回答非常好。在这里,我只是建议一些微小的变化而不声称它们更好:

out = []    # collect all valid 'g'
ocount = 0  # only store the difference between open and closed
for m in re.finditer('[\(\)g]', sen):   # use re to preselect
    L = m.group()
    ocount += {'(':1, ')':-1, 'g':0}[L] # save a bit of typing
    assert ocount >= 0                  # enforce some grammar if you like
    if L == 'g' and ocount == 0:
        out.append(m.start())

out
# [31, 35, 39]

答案 3 :(得分:1)

这是在OP中更简单地采用代码(并考虑条件count_open != 0):

def get_idx(f, sen):
    idx = []
    count_open= 0
    count_close=0

    for i, w in enumerate(sen):
        if w == '(':
            count_open += 1
        if w == ')':
            count_close += 1    
        if count_open == count_close & count_open != 0:
            if w == f:
                idx.append(i)

    return idx

get_idx('g', sen)

输出:

[31, 35, 39]

答案 4 :(得分:-1)

您可以使用.index()查找字符串或列表中字符串或元素的索引。

将stringvar.index(string)放入,这将为您提供字符串的偏移量或索引。