Question

我对编码有些陌生，所以如果答案很明显，我会提前道歉。

我正在寻找一种Python或SQL解决方案，该解决方案可以从表中各个字段内超过170k字符串的给定列表中识别所有关键字。使用re.findall不是必需的，但是我所知道的与我寻求的解决方案最接近。

例如，如果我的关键字列表包括： bite，ankles，flesh，wound和我表中的目标列依次包含以下字段：

我会咬你的脚踝。

只有肉伤。

只被肉脚踝咬住。

我想在表中创建一个新列，该列按顺序包含以下字段：

咬脚踝

肉伤

肉脚踝咬住

让这个问题更加复杂的是，我可以在必须使用的环境中访问有限的库，因此越基础越好。预先感谢您提供我可以用来将表格和列表插入外壳的帮助。

Answer 1

对于Python，您可以从这样的内容开始：

>>> # make a set of the keywords
... keywords = {"bite", "ankles", "flesh", "wound"}
>>> # get the input as list of strings
... strings = ["I’ll bite your ankles", "Only a flesh wound", "Flesh ankles bite only"]
>>> [" ".join(filter(lambda x: x.lower() in keywords, s.split(" "))) for s in strings]
['bite ankles', 'flesh wound', 'Flesh ankles bite']

Answer 2

results.items[i].insertText("Any text going here.\n", "replace");

re.findall的最佳选择，可以使用列表而不是正则表达式

2 个答案: