我是python和Pandas的新手,想要在一个名为' Body'的CSV文件列上执行词干化。使用熊猫。我的代码如下:
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
answers= pd.read_csv('F:/mtech/Project/answer_sample.csv')
porter_stemmer = PorterStemmer()
#print(answers.head())
#print(answers.loc[0:,"Body"])
df= pd.read_csv('F:/mtech/Project/answer_sample.csv','utf-8')
df['Body'] = df['Body'].str.lower().str.split()
stop = stopwords.words('english')
df['Body']= df['Body'].apply(lambda x: [item for item in x if item not in stop])
df['Body_Tokenized']= df['Body'].apply(lambda x : filter(None,x.split(' ')))
df['Body_Stemmed']= df['Body_Tokenized'].apply(lambda x : [porter_stemmer.stem(y) for y in x])
df.to_csv('F:/mtech/Project/answer_swr_stem.csv')
print("Done..")
我能够执行禁用词删除但是在阻止时,我收到以下错误:
Traceback (most recent call last):
File "F:\mtech\DATASET\answer_pd.py", line 10, in <module>
df= pd.read_csv('F:/mtech/Project/answer_sample.csv','utf-8')
File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 401, in _read
data = parser.read()
File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 1997, in read
alldata = self._rows_to_cols(content)
File "C:\Users\Ayushi Misra\Anaconda2\lib\site-packages\pandas\io\parsers.py", line 2551, in _rows_to_cols
raise ValueError(msg)
ValueError: Expected 1 fields in line 2853, saw 2. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
需要帮助!