删除选择性连字
Click
我想在'list to keep'中保留单词连字符,但用空格替换系列中的所有其他' - '。
答案 0 :(得分:2)
你可以试试这个:
S = s.str.split(expand=True).T[0]
' '.join(np.where(S.isin(list_to_keep), S, S.str.replace('-', '')))
输出:
'do not-remove this-hyphen but removeall of thesehyphens'
工作原理。
答案 1 :(得分:0)
一个简单的天真解决方案是:
s = 'do not-remove this-hyphen but remove-all of these-hyphens'
words_to_keep = {'not-remove', 'this-hyphen'}
new_s = []
for word in s.split():
if word not in words_to_keep:
word = word.replace('-', ' ')
new_s.append(word)
print(' '.join(new_s)) # do not-remove this-hyphen but remove all of these hyphens
另一种使用map的方法:
def unhyphen_word(word):
return word.replace('-', ' ') if word not in words_to_keep else word
print(' '.join(map(unhyphen_word, s.split())))
或列表理解:
print(' '.join([unhyphen_word(word) for word in s.split()]))
答案 2 :(得分:0)
编辑嗯......它有效,但似乎有些不对劲......
我不是正则表达式的主人,但有一种方法是分两步完成:
-
添加到要保留的字词-
概念证明:
import pandas as pd
s = pd.Series(['do not-remove this-hyphen but remove-all of these-hyphens'])
words_to_keep = {'not-remove', 'this-hyphen'}
p1 = '|'.join(['(?!({}))-(?=({})[.,; ])'.format(*i.split('-'))
for i in words_to_keep])
p2 = '(?!\w+)-(?=\w+)'
s.str.replace(p1,'--').str.replace(p2,'')[0]
返回:
'do not-remove this-hyphen but removeall of thesehyphens'