我正在使用以下代码将给定template <class HarmType>
class osc_solver {
public:
template <class... ArgsType>
osc_solver(double a, ArgsType&&... parameters_ham) : m_ho(std::forward<ArgsType>(a, parameters_harm)...)
{
// Here use parameter a
}
// ...
中的words
中的字符串替换为words[0]
。
sentences
我的预期输出如下:
import re
sentences = ['industrial text minings', 'i love advanced data minings and text mining']
words = ["data mining", "advanced data mining", "data minings", "text mining"]
start_terms = sorted(words, key=lambda x: len(x), reverse=True)
start_re = "|".join(re.escape(item) for item in start_terms)
results = []
for sentence in sentences:
for terms in words:
if terms in sentence:
result = re.sub(start_re, words[0], sentence)
results.append(result)
break
print(results)
但是,我得到的是:
[industrial text minings', 'i love data mining and data mining]
第一句[industrial data minings', 'i love data mining and data mining]
不在text minings
中。但是,它在单词列表中包含“文本挖掘”,因此“工业文本挖掘”中的条件“文本挖掘”变为words
。然后替换后,它的“文本挖掘”变为“数据挖掘”,并且“ s”字符停留在同一位置。我想避免这种情况。
因此,我想知道是否有一种方法可以使用True
中的if条件来查看下一个字符是否为空格。如果有空格,请进行替换,否则不要这样做。
我也对可以解决我的问题的其他解决方案感到满意。
答案 0 :(得分:2)
我对您的代码做了一些修改:
# Using Python 3.6.1
import re
sentences = ['industrial text minings and data minings and data', 'i love advanced data mining and text mining as data mining has become a trend']
words = ["data mining", "advanced data mining", "data minings", "text mining", "data", 'text']
# Sort by length
start_terms = sorted(words, key=len, reverse=True)
results = []
# Loop through sentences
for sentence in sentences:
# Loop through sorted words to replace
result = sentence
for term in start_terms:
# Use exact word matching
exact_regex = r'\b' + re.escape(term) + r'\b'
# Replace matches with blank space (to avoid priority conflicts)
result = re.sub(exact_regex, " ", result)
# Replace inserted blank spaces with "data mining"
blank_regex = r'^\s(?=\s)|(?<=\s)\s$|(?<=\s)\s(?=\s)'
result = re.sub(blank_regex, words[0] , result)
results.append(result)
# Print sentences
print(results)
输出:
['industrial data mining minings and data mining and data mining', 'i love data mining and data mining as data mining has become a trend']
正则表达式可能会有些混乱,所以这里有个快速的细分:
\bword\b
匹配完全匹配的短语/单词,因为\b
是单词边界(有关here的更多信息)
^\s(?=\s)
开头是一个空格,后面是另一个空格。
(?<=\s)\s$
与结尾处的空格匹配,之后是另一个空格。
(?<=\s)\s(?=\s)
将一个空格与两个空格都匹配。
有关正面看待(?<=...)
和正面看待(?=...)
的更多信息,请参见this Regex tutorial。
答案 1 :(得分:1)
您可以使用单词边界\b
包围整个正则表达式:
start_re = "\\b(?:" + "|".join(re.escape(item) for item in start_terms) + ")\\b"
您的正则表达式将变为:
\b(?:data mining|advanced data mining|data minings|text mining)\b
(?:)
表示非捕获组。