尽管使用正确的列名称仍然存在键错误

时间:2018-01-04 07:49:53

标签: python pandas

列的名称是'description'。错误指向正则表达式行,我试图用空格替换描述列中的任何非字母字符。

感谢您的帮助!

df = pd.read_csv('winemag-data_first150k.csv')
dataset = df[['description', 'points']]
train = dataset.sample(frac = 0.1, random_state = 200)
test = dataset.drop(train.index)
train.head()
wordlist = []
for elem in range(1,15093):
    taste = re.sub('[^a-zA-Z]', ' ', train["description"][elem])
    taste = taste.lower()
    taste = taste.split()
    PorStem = PorterStemmer()
    judge = [PorStem.stem(word) for word in taste if word not in set(stopwords.words('english'))]    
    judge = ' '.join(judge)
    wordlist.append(judge)

错误 -

KeyError                                  Traceback (most recent call last)
<ipython-input-28-4a7dc36c2440> in <module>()
  1 wordlist = []
  2 for elem in range(1,15093):
----> 3     taste = re.sub('[^a-zA-Z]', ' ', train["description"][elem])
  4     taste = taste.lower()
  5     taste = taste.split()

~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
599         key = com._apply_if_callable(key, self)
600         try:
--> 601             result = self.index.get_value(self, key)
602 
603             if not is_scalar(result):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
2475         try:
2476             return self._engine.get_value(s, k,
-> 2477                                           tz=getattr(series.dtype, 
 'tz', None))
2478         except KeyError as e1:
2479             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()  

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

KeyError: 1

1 个答案:

答案 0 :(得分:1)

似乎你需要:

#select column description
description = dataset.sample(frac = 0.1, random_state = 200)['description']
#use pandas str text function
description = description.str.replace('[^a-zA-Z]', ' ').str.lower().str.split()
PorStem = PorterStemmer()
#apply function
f = lambda x: ' '.join([PorStem.stem(word) for word in x if word not in set(stopwords.words('english'))])
#convert output to lists
wordlist = description.apply(f).values.tolist()