我想在代码中尝试3件事:
如何在不使用' join的情况下删除标点符号。'功能?我是Python的新手,并且还没有成功地使用类似的方式删除停用词...
import string
s = raw_input("Search: ") #user input
stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
"of", "from", "here", "even", "the", "but", "and", "is", "my", \
"them", "then", "this", "that", "than", "though", "so", "are" ]
PunctuationToRemove = [".", ",", ":", ";", "!" ,"?", "&"]
while s != "":
s1 = ""
#Deleting punctuations and applying lowercase
for c in s: #for characters in user's input
if c not in PunctuationToRemove + " ": #characters that don't include punctuations and blanks
s1 = s + c #store the above result to s1
s1 = string.lower(s) #then change s1 to lowercase
print s1
答案 0 :(得分:0)
摆脱你可以做的所有停止词:
[word for word in myString.split(" ") if word not in stopWords]
答案 1 :(得分:0)
我建议先摆脱所有标点符号。这可以使用for循环来完成:
for forbiddenChar in PunctuationToRemove:
s = s.replace(forbiddenChar,"") #Replace forbidden chars with empty string
然后,您可以使用s
将输入字符串s.split(' ')
拆分为单词。然后,您可以使用for循环将所有单词(小写)添加到新字符串s1
:
words = s.split(' ')
s1 = ""
for word in words:
if word not in stopWords:
s1 = s1 + string.lower(word) + " "
s1 = s1.rstrip(" ") #Strip trailing space
答案 2 :(得分:0)
这个怎么样,
s = 'I am student! Hello world&.~*~'
PunctuationToRemove = [".", ",", ":", ";", "!" ,"?", "&"]
stopWords = set([ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
"of", "from", "here", "even", "the", "but", "and", "is", "my", \
"them", "then", "this", "that", "than", "though", "so", "are" ])
# Remove specific punctuations
s_removed_punctuations = s.translate(None, ''.join(PunctuationToRemove))
# Converte input to lowercase
s_lower = s_removed_punctuations.lower()
# Remove stop words
s_result = ' '.join(s for s in s_lower.split() if s not in stopWords).strip()
print(s_result)
#student hello world~*~