我有一张表,我来自一个数据透视表,以消除缺失值和太短的值,如城市名称,这是我的代码
company = pd.read_sql('SELECT user_id, address FROM company' , con=db_connection)
table = pd.pivot_table(company, index=['address'],aggfunc=np.sum)
table.reset_index()
然后我得到了他的
address user_id
3 Jl. Raya Kranggan No. 7, Ruko Kav V No. 1 Jat... 65132
4 #ALAMAT atau LOKASI\r\nKota bengkulu perhubung... 15570
5 '--!>'</script/><Svg/Onload=confirm`alamat bis... 48721
6 (Rumah Bpk.RA'IS) Jl.Puskesmas RT.004/11 No.29... 20786
检查列时似乎没问题
table.columns
Index(['user_id', 'address'], dtype='object')
然后我不能打电话给专栏
table['address']
当我调用该列时,就会发生这种情况
KeyError Traceback (most recent call last)
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2392 try:
-> 2393 return self._engine.get_loc(key)
2394 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)()
KeyError: 'address'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-46-eef3b78ea5fd> in <module>()
----> 1 table['address'] #.astype(str)
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2060 return self._getitem_multilevel(key)
2061 else:
-> 2062 return self._getitem_column(key)
2063
2064 def _getitem_column(self, key):
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2067 # get column
2068 if self.columns.is_unique:
-> 2069 return self._get_item_cache(key)
2070
2071 # duplicate columns & possible reduce dimensionality
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
1532 res = cache.get(item)
1533 if res is None:
-> 1534 values = self._data.get(item)
1535 res = self._box_item_values(item, values)
1536 cache[item] = res
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
3588
3589 if not isnull(item):
-> 3590 loc = self.items.get_loc(item)
3591 else:
3592 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Users\asus\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2393 return self._engine.get_loc(key)
2394 except KeyError:
-> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key))
2396
2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)()
KeyError: 'address'
如果您有其他解决方案,只要我可以将地址映射到keyword mapping
就可以了答案 0 :(得分:2)
我认为您需要返回reset_index
的输出,因为address
是索引名称,没有列:
table = pd.pivot_table(company, index='address',aggfunc=np.sum).reset_index()
另一种解决方案,如果要为聚合sum
定义列:
table = company.groupby('address', as_index=False)['user_id'].sum()
或者:
table = company.groupby('address')['user_id'].sum().reset_index()
对于所有列:
table = company.groupby('address', as_index=False).sum()
table = company.groupby('address').sum().reset_index()
答案 1 :(得分:2)
我认为pivot
在这里不是一个合适的选择。
您可以使用:
company.groupby('address').sum()