我正在尝试替换出现的情况;例如“ word one”和“ word_one”。用'_'替换空白。
这是我的代码:
labels_ls = ['word <= 0.01', 'word_two <= 0.23', 'word three <= 0.01']
regex_whitespace = r'\w+\s+\w+\b'
new_regex = r'\w+\_+\w+\b'
pattern = re.compile(regex_whitespace) # this I just added after reviewing other related questions
# Loop through labels_ls to find any ngrams whitespace separated labels (i.e gilt maximal)
for i in labels_ls:
if re.match(regex_whitespace, i):
# replace the whitespace with a '_' to form gilt*maximal
new_string = re.sub(pattern, new_regex, i)
print('new string: ', new_string)
我在https://pythex.org处测试了我的正则表达式,它可以按要求工作,但是,当我运行此代码时,出现以下错误:
re.error:位置0处的逃逸\ w
我查看了所有相关的已回答问题:
how to fix - error: bad escape \u at position 0
和
Regex: Replace one pattern with another
我曾尝试删除上述问题中提到的正则表达式前的r,但仍然无法正常工作。
我也尝试使用compile(),但这也不能解决问题
labels_ls = ['internal_punctuation <= 0.042', 'darf <= 0.717', 'formal_global_yes <= 0.5', 'wert <= 0.272', 'signal <= 0.5', 'Flesch_Index <= 0.813', 'zulass <= 0.379', 'polarity <= 0.713', 'Nb_of_auxiliary <= 0.071', 'gini = 0.0', 'polarity <= 0.375', 'gini = 0.0', 'Nb_of_verbs <= 0.094', 'weakwords_nb <= 0.143', 'passive_global_yes <= 0.5', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'Nb_of_verbs <= 0.094', 'passive_global_yes <= 0.5', 'WPS <= 0.062', 'measurement_values_no <= 0.5', 'gini = 0.0', 'SPW <= 0.575', 'weird_words <= 0.042', 'weakwords_nb <= 0.036', 'SPW <= 0.272', 'gini = 0.0', 'words_nb <= 0.033', 'gini = 0.5', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'Flesch_Index <= 0.774', 'SPW <= 0.331', 'gini = 0.0', 'gini = 0.0', 'Comp_conj <= 0.375', 'SPW <= 0.111', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'Sub_Conj <= 0.25', 'weird_words <= 0.208', 'zsdf <= 0.5', 'signal <= 0.297', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'words_nb <= 0.164', 'Aux_Start_no <= 0.5', 'gini = 0.0', 'Nb_of_Umsetzbarkeit_conj <= 0.167', 'werden <= 0.125', 'darf <= 0.297', 'polarity <= 0.925', 'SPW <= 0.376', 'WPS <= 0.11', 'numerical_values <= 0.091', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'WPS <= 0.11', 'gini = 0.0', 'gini = 0.0', 'polarity <= 0.25', 'gini = 0.0', 'Flesch_Index <= 0.663', 'words_nb <= 0.033', 'SPW <= 0.475', 'gini = 0.0', 'gini = 0.0', 'Comp_conj <= 0.125', 'gini = 0.56', 'gini = 0.0', 'Flesch_Index <= 0.75', 'gini = 0.444', 'gini = 0.0', 'Aux_Start_yes <= 0.5', 'darf <= 0.241', 'Nb_of_verbs <= 0.156', 'gini = 0.0', 'SPW <= 0.246', 'polarity <= 0.675', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'Sub_Conj <= 0.25', 'numerical_values <= 0.227', 'funktion <= 0.348', 'internal_punctuation <= 0.458', 'polarity <= 0.375', 'gini = 0.0', 'Nb_of_verbs <= 0.031', 'gini = 0.0', 'Flesch_Index <= 0.409', 'gini = 0.0', 'numerical_values <= 0.136', 'WPS <= 0.065', 'darf <= 0.359', 'Nb_of_Umsetzbarkeit_conj <= 0.167', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'formal_global_no <= 0.5', 'WPS <= 0.164', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gini = 0.0', 'gilt randbeding <= 0.181', 'fahrzeug <= 0.352', 'gini = 0.0', 'zulass <= 0.082', 'gini = 0.0', 'gini = 0.0', 'fur <= 0.194', 'weakwords_nb <= 0.321', 'gini = 0.444', 'gini = 0.0', 'gini = 0.0', 'Nb_of_Umsetzbarkeit_conj <= 0.167', 'Nb_of_verbs <= 0.344', 'gini = 0.0', 'gini = 0.0', 'words_nb <= 0.178', 'gini = 0.0', 'words_nb <= 0.224', 'gini = 0.0', 'gini = 0.0']
答案 0 :(得分:5)
您需要使用
/basic/source/inc/parser.hxx
,然后再更新
:regex_whitespace = r'(\w+)\s+(\w+)\b'
重点是您需要将与第一个正则表达式匹配的单词chars捕获到capturing groups中,然后使用backreferences来匹配匹配的组值。 new_string = re.sub(pattern, r'\1_\2', i)
是多余的,因为您不能使用正则表达式模式进行替换,替换模式只能包含反向引用和转义序列(必须在此处转义文字反斜杠)。