使用Pandas在csv文件上执行词干分析

时间:2017-12-06 14:17:20

标签: python pandas csv

我是python和Pandas的新手,想要在一个名为' Body'的CSV文件列上执行词干化。使用熊猫。我的代码如下:

import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

answers= pd.read_csv('F:/mtech/Project/answer_sample.csv')
porter_stemmer = PorterStemmer()
#print(answers.head())
#print(answers.loc[0:,"Body"])
df= pd.read_csv('F:/mtech/Project/answer_sample.csv','utf-8')
df['Body'] = df['Body'].str.lower().str.split()
stop = stopwords.words('english')
df['Body']= df['Body'].apply(lambda x: [item for item in x if item not in stop])

df['Body_Tokenized']= df['Body'].apply(lambda x : filter(None,x.split(' ')))

df['Body_Stemmed']= df['Body_Tokenized'].apply(lambda x : [porter_stemmer.stem(y) for y in x])

df.to_csv('F:/mtech/Project/answer_swr_stem.csv')
print("Done..")

我能够执行禁用词删除但是在阻止时,我收到以下错误:

Traceback (most recent call last):
  File "F:\mtech\DATASET\answer_pd.py", line 10, in <module>
    df= pd.read_csv('F:/mtech/Project/answer_sample.csv','utf-8')
  File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 401, in _read
    data = parser.read()
  File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1997, in read
    alldata = self._rows_to_cols(content)
  File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 2551, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expected 1 fields in line 2853, saw 2. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

需要帮助!

0 个答案:

没有答案