我正在尝试从stopwords.txt文件中提取单词,并将它们作为字符串附加到python列表中。
stopwords.txt
a
about
above
after
again
against
all
am
an
and
any
are
aren't
as
at
be
because
been
before
being
我的代码:
stopword = open("stopwords.txt", "r")
stopwords = []
for word in stopword:
stopwords.append(word)
列出停用词输出:
['a\n',
'about\n',
'above\n',
'after\n',
'again\n',
'against\n',
'all\n',
'am\n',
'an\n',
'and\n',
'any\n',
'are\n',
"aren't\n",
'as\n',
'at\n',
'be\n',
'because\n',
'been\n',
'before\n',
'being\n']
所需的输出:
['a',
'about',
'above',
'after',
'again',
'against',
'all',
'am',
'an',
'and',
'any',
'are',
"aren't",
'as',
'at',
'be',
'because',
'been',
'before',
'being']
是否有任何方法可以转置停用词以消除'\ n'字符,或者根本没有任何方法可以达到所需的输出?
答案 0 :(得分:1)
代替
stopwords.append(word)
做
stopwords.append(word.strip())
string.strip()
方法从字符串的开头和结尾去除任何种类的空格(空格,制表符,换行符等)。您可以为函数指定一个参数,以去除特定的字符串或字符集,或者使用lstrip()
或rstrip()
仅去除字符串的前部或后部,但在这种情况下,仅{ {1}}就足够了。
答案 1 :(得分:1)
您可以使用.strip()
方法。它将从字符串中删除所有出现的作为参数传递的字符:
stopword = open("stopwords.txt", "r")
stopwords = []
for word in stopword:
stopwords.append(word.strip("\n"))