Question

我目前正在遍历一些文件（效果很好），试图弄清楚如何获取某物的索引，并查看是否可以在提供的列表中找到与之匹配的词。

例如：

其中一个文件中包含以下内容：

# get the scripts from the build context and make sure they are executable
COPY .shared/scripts/ /tmp/scripts/
RUN chmod +x -R /tmp/scripts/

# install extensions
RUN /tmp/scripts/install_php_extensions.sh

我列出了要匹配的内容：

MYLIST['APPLE'] = 'Granny-Smith'
SOMETHINGELSE['BUILDING'] = 'Tall'
ANOTHERTHING['SPELLING'] = 'bad'
ADDITIONALLY['BERRY'] = 'Rasp'

如果我使用此正则表达式，它会找到正确的索引（但会找到所有索引）：

keywords = ['apple', 'berry', 'grape']

但是我正尝试扩展该正则表达式，以便它只能找到关键字列表中存在的正则表达式。

要完成此操作，我还需要向正则表达式添加什么？

Answer 1

如果您有多个单词，则只能使用正则表达式，但是如果您有很多单词，则将正则表达式和常规搜索结合起来更为合理：

import re

data = [
    "MYLIST['APPLE'] = 'Granny-Smith'",
    "SOMETHINGELSE['BUILDING'] = 'Tall'",
    "ANOTHERTHING['SPELLING'] = 'bad'",
    "ADDITIONALLY['BERRY'] = 'Rasp'"
]

REGEX = re.compile(r"\['(?P<word>.*?)'\]")
words = ['apple', 'berry', 'grape']

for line in data:
    found = REGEX.search(line)
    if found:
        word = found.group('word').lower()
        if word in words:
            print('FOUND: ', word)

将打印：

FOUND:  apple
FOUND:  berry

此技术也更好，因为regexp更简单易读，因此更容易调试和修改此代码。

Answer 2

如果只想使用正则表达式，可以使用：

keywords = ['apple', 'berry', 'grape']
regex = "\[({})\]".format("|".join(keywords))

我将大写/小写留给你。

从这里how to do re.compile() with a list in python得到了这个主意，因此赞成。

正则表达式用于在列表Python中查找匹配项

2 个答案: