我有以下课程方法:
class Trigger():
def getRidOfTrashPerSentence(self, line, stopwords):
countWord = 0
words = line.split()
for word in words:
if countWord == 0:
if word in stopwords:
sep = word
lineNew = line.split(sep, 1)[0]
countWord = countWord + 1
return(lineNew)
stopwords = ['regards', 'Regards']
def getRidOfTrash(self, aTranscript):
result = [self.getRidOfTrashPerSentence(line, self.stopwords) for line in aTranscript]
return(result)
我想要实现的是在['regards', 'Regards']
所以当我插入这样的块时:
aTranScript = [ "That's fine, regards Henk", "Allright great"]
我正在寻找这样的输出:
aTranScript = [ "That's fine, regards", "Allright great"]
然而,当我这样做时:
newFile = Trigger()
newContent = newFile.getRidOfTrash(aTranScript)
我只获得"That's fine"
。
关于如何获得两个字符串的任何想法
答案 0 :(得分:2)
这是一个简单的解决方案:
cardview
此代码将返回:'您好,很好'
如果你愿意,你可以联系'问候'最后:
yourString.split(',问候')[0] +',问候'
答案 1 :(得分:1)
正则表达式使更换更容易。作为奖励,它不区分大小写,因此您不必在列表中写下'regards'
和'Regards'
:
import re
stop_words = ['regards', 'cheers']
def remove_text_after_stopwords(text, stop_words):
pattern = "(%s).*$" % '|'.join(stop_words)
remove_trash = re.compile(pattern, re.IGNORECASE)
return re.sub(remove_trash, '\g<1>', text)
print remove_text_after_stopwords("That's fine, regards, Henk", stop_words)
# That's fine, regards
print remove_text_after_stopwords("Good, cheers! Paul", stop_words)
# Good, cheers
print remove_text_after_stopwords("No stop word here", stop_words)
# No stop word here
如果你有一个字符串列表,你可以使用列表推导将这个方法应用于每个字符串。
答案 2 :(得分:0)
如果前一个单词是一个停用词,您可以扫描该行中的单词并将其删除:
class Trigger():
stopwords = ['regards', 'Regards']
def getRidOfTrashPerSentence(self, line):
words = line.split()
new_words = [words[0]]
for i in range(1, len(words)):
if not words[i-1] in self.stopwords:
new_words.append(words[i])
return " ".join(new_words) # reconstruct line
def getRidOfTrash(self, aTranscript):
result = [self.getRidOfTrashPerSentence(line) for line in aTranscript]
return(result)
aTranScript = [ "That's fine, regards Henk", "Allright great"]
newFile = Trigger()
newContent = newFile.getRidOfTrash(aTranScript)
print(newContent)