Question

我知道＆＃39;在＆＃39;可以像这样在另一个字符串中找到子字符串。 [How to determine whether a substring is in a different string

但我不知道如何在下面的例子中找到确切的子字符串：

text = '"Peter,just say hello world." Mary said "En..."'

我想判断彼得是否会＆＃39;是在文本中但不在＆＃34; XXXX＆＃34;内容。如果我使用

if 'Peter' in text: 
    print 'yes' 
else: 
    print 'no'

但结果会回归“是”，这是错误的，因为彼得＆＃39;在＆＃34; XXXXX＆＃34;。

除了解决这个问题，我想得到左边的＆＃34; XXXX＆＃34;内容。例如，玛丽＆＃39;是在文本中而不是在＆＃34; XXXX＆＃34;内容。我也希望得到＃34;彼得，只是问问世界。＆＃34;。

Answer 1

为了满足您自己的特殊要求，我认为这是一个逐字处理文本的好方法，这是培养您处理字符串技能的好方法。对于这个问题，你可以使用stack来存储双引号，这样你就可以判断一个字母是否是双引号。

Answer 2

与许多字符串处理问题一样，regular expressions是您的朋友。处理此问题的一种方法是从字符串的前面开始并逐步处理它。

检查字符串的开头，看是否有不带引号或带引号的文字。如果它没有引用，请将所有未引用的文本拉出，直到您点击引号。如果它是引用的文本，请拉出所有内容，直到您点击结束引号。继续处理文本，直到所有文本都被处理并分类为引用或不引用。

然后，您将有两个单独的引用和不带引号的文本字符串列表。然后，您可以在任一列表中执行字符串包含检查。

text = '"Peter,just say hello world." Mary said "En..."' 

unquoted_text = []
quoted_text = []

while text:
    # Pull unquoted text off the front
    m = re.match(r'^([^"]+)(.*)$', text)
    if m:
        unquoted_text.append(m.group(1))
        text = m.group(2)

    # Pull quoted text off the front
    m = re.match(r'^"([^"]*)"(.*)$', text)
    if m:
        quoted_text.append(m.group(1))
        text = m.group(2)

    # Just in case there is a single unmatched double quote (bad!)
    # Categorize as unquoted
    m = re.match(r'^"([^"]*)$', text)
    if m:
        unquoted_text.append(m.group(1))
        text = ''

print 'UNQUOTED'
print unquoted_text

print 'QUOTED'
print quoted_text

is_peter_in_quotes = any(['Peter' in t for t in quoted_text])

如何在python中更准确地找到目标字符串中的子字符串？

2 个答案: