Question

这是我的代码，它不起作用

from pythainlp.corpus import thai_stopwords    
stopwords =thai_stopwords()
    def remove_stopwords(x):
      list_token = []
      for i in x:
          if i not in stopwords:
            list_token.append(i)

   return list_token
df['tokens']=df['tokens'].apply(remove_stopwords)

并且我已经尝试过：

df['tokens'] = df['tokens'].apply(lambda x: [item for item in x if item not in stopwords])

Answer 1

假设您的stopwords是一个列表，而df['tokens']是一个单词或标记的列表。
简单方法：

clear_tokens = []
for i in df.index:
   clear_tokens.append([item for item in df.tokens[i] if item not in stopwords])

df['tokens'] = clear_tokens

如果您的df.tokens是每一行中的一个句子，则：

clear_tokens = []
for i in df.index:
   tokenlist = df.tokens[i].split()
   clear_tokens.append(' '.join([item for item in tokenlist if item not in stopwords]))

df['tokens'] = clear_tokens

如何删除数据框中的停用词（Python）

1 个答案: