Question

我有以下字符串数据：

data = "*****''[[dogs and cats]]''/n"

我想在python中使用正则表达式来提取字符串。所有数据都用双引号“”封装。我使用的通配符是什么，所以我可以得到以下内容：

print data.groups(1)
print data.groups(2)
print data.groups(3)

'dogs'
'and'
'cats'

编辑：到目前为止，我有一些很长的路线

  test = re.search("\\S*****''[[(.+) (.+) (.+)\\S]]''", "*****''[[dogs and cats]]''\n") 
  print test.group(1)

Answer 1

很难确切地知道你在寻找什么，但我会假设你正在寻找一个解析一个或多个空格分隔的单词的正则表达式，这些单词由一些非字母数字字符包围。

data = "*****''[[dogs and cats]]''/n"

# this pulls out the 'dogs and cats' substring
interior = re.match(r'\W*([\w ]*)\W*', data).group(1)

words = interior.split()

print words
# => ['dogs', 'and', 'cats']

但是，这会对您的要求做出很多假设。根据您的需要，正则表达式可能不是最好的工具。

Answer 2

有些人在遇到问题时会想：“我知道，我会使用正则表达式。”现在他们有两个问题。“Jamie Zawinski

data = "*****''[[dogs and cats]]''/n"
start = data.find('[')+2
end = data.find(']')
answer = data[start:end].split()

print answer[0]
print answer[1]
print answer[2]

Answer 3

正如其他人所说，使用一个额外的split步骤非常简单：

data = "***rubbish**''[[dogs and cats]]''**more rubbish***"
words = re.findall('\[\[(.+?)\]\]', data)[0].split() # 'dogs', 'and', 'cats'

也可以使用一个表达式，但它看起来相当混乱：

rr = r'''
    (?x)
    (\w+)
    (?=
        (?:
            (?!\[\[)
            .
        )*?
        \]\]
    )
'''
words = re.findall(rr, data) # 'dogs', 'and', 'cats'

与多个非空格字符匹配的模式

3 个答案: