列的名称是'description'。错误指向正则表达式行,我试图用空格替换描述列中的任何非字母字符。
感谢您的帮助!
df = pd.read_csv('winemag-data_first150k.csv')
dataset = df[['description', 'points']]
train = dataset.sample(frac = 0.1, random_state = 200)
test = dataset.drop(train.index)
train.head()
wordlist = []
for elem in range(1,15093):
taste = re.sub('[^a-zA-Z]', ' ', train["description"][elem])
taste = taste.lower()
taste = taste.split()
PorStem = PorterStemmer()
judge = [PorStem.stem(word) for word in taste if word not in set(stopwords.words('english'))]
judge = ' '.join(judge)
wordlist.append(judge)
错误 -
KeyError Traceback (most recent call last)
<ipython-input-28-4a7dc36c2440> in <module>()
1 wordlist = []
2 for elem in range(1,15093):
----> 3 taste = re.sub('[^a-zA-Z]', ' ', train["description"][elem])
4 taste = taste.lower()
5 taste = taste.split()
~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
599 key = com._apply_if_callable(key, self)
600 try:
--> 601 result = self.index.get_value(self, key)
602
603 if not is_scalar(result):
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
2475 try:
2476 return self._engine.get_value(s, k,
-> 2477 tz=getattr(series.dtype,
'tz', None))
2478 except KeyError as e1:
2479 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.Int64HashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in
pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1
答案 0 :(得分:1)
似乎你需要:
#select column description
description = dataset.sample(frac = 0.1, random_state = 200)['description']
#use pandas str text function
description = description.str.replace('[^a-zA-Z]', ' ').str.lower().str.split()
PorStem = PorterStemmer()
#apply function
f = lambda x: ' '.join([PorStem.stem(word) for word in x if word not in set(stopwords.words('english'))])
#convert output to lists
wordlist = description.apply(f).values.tolist()