我想计算出现在一列的每一行中的每个单词的数量('Comment'),并将其存储在名为headlamp的数据框中的新列('word')中。 我正在尝试使用以下代码,但是,我得到了错误。
for i in range(0,len(headlamp)):
headlamp['word'].apply(lambda text: Counter(" ".join(headlamp['Comment'][i].astype(str)).split(" ")).items())
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-16-a0c20291b4f5> in <module>()
1 for i in range(0,len(headlamp)):
----> 2 headlamp['word'].apply(lambda text: Counter("".join(headlamp['Comment'][i].astype(str)).split(" ")).items())
C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
1995 return self._getitem_multilevel(key)
1996 else:
-> 1997 return self._getitem_column(key)
1998
1999 def _getitem_column(self, key):
C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\frame.pyc in _getitem_column(self, key)
2002 # get column
2003 if self.columns.is_unique:
-> 2004 return self._get_item_cache(key)
2005
2006 # duplicate columns & possible reduce dimensionality
C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
1348 res = cache.get(item)
1349 if res is None:
-> 1350 values = self._data.get(item)
1351 res = self._box_item_values(item, values)
1352 cache[item] = res
C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\core\internals.pyc in get(self, item, fastpath)
3288
3289 if not isnull(item):
-> 3290 loc = self.items.get_loc(item)
3291 else:
3292 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Users\Rafael\Anaconda2\envs\gl-env\lib\site-packages\pandas\indexes\base.pyc in get_loc(self, key, method, tolerance)
1945 return self._engine.get_loc(key)
1946 except KeyError:
-> 1947 returnself._engine.get_loc(self._maybe_cast_indexer(key))
1948
1949 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4154)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4018)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12368)()
pandas\hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12322)()
KeyError: 'word'
任何帮助都将非常感谢
答案 0 :(得分:1)
你可以试试这个:
headlamp['word'] = headlamp['Comment'].apply(lambda x: len(x.split()))
示例:强>
headlamp = pd.DataFrame({'Comment': ['hello world','world','foo','foo and bar']})
print(headlamp)
Comment
0 hello world
1 world
2 foo
3 foo and bar
headlamp['word'] = headlamp['Comment'].apply(lambda x: len(x.split()))
print(headlamp)
Comment word
0 hello world 2
1 world 1
2 foo 1
3 foo and bar 3
答案 1 :(得分:0)
使用most_common()方法可以实现您想要的效果。
随意使用这段代码:
import pandas as pd
from collections import Counter
df = pd.DataFrame({'Comment': ['This has has words words words that are written twice twice', 'This is a comment without repetitions', 'This comment, has ponctuations!']}, index = [0, 1, 2])
#you must create the new column before trying to assing any value
df['Words'] = ""
#counting frequencies
i = 0
for row in df['Comment']:
df['Words'][i] = str(Counter(row.split()).most_common())
i+=1
print df
输出:
Comment \
0 This has has words words words that are writte...
1 This is a comment without repetitions
2 This comment, has ponctuations!
Words
0 [('words', 3), ('twice', 2), ('has', 2), ('tha...
1 [('a', 1), ('comment', 1), ('This', 1), ('is',...
2 [('This', 1), ('comment,', 1), ('has', 1), ('p...