如何有效地使用pandas为每个客户附加多个KPI值?
pivoted
df与customers
df的连接会产生一些问题,因为该国家/地区是数据框架的索引,且国籍不在索引中。
countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
'indicator':['z','x','z','x'],
'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
'nationality':['Germany','Austria'],
'value':[7,8]})
答案 0 :(得分:2)
我认为您可以使用concat
:
df_pivoted = countryKPI.pivot_table(index='country',
columns='indicator',
values='value',
fill_value=0)
print (df_pivoted)
indicator x z
country
Austria 7 7
Germany 8 9
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1))
customer value x z
Austria second 8 7 7
Germany first 7 8 9
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
.reset_index()
.rename(columns={'index':'nationality'})
[['customer','nationality','value','x','z']])
customer nationality value x z
0 second Austria 8 7 7
1 first Germany 7 8 9
通过评论编辑:
问题是dtypes
列customers.nationality
和countryKPI.country
category
是import pandas as pd
import numpy as np
countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
'indicator':['z','x','z','x'],
'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
'nationality':['Slovakia','Austria'],
'value':[7,8]})
customers.nationality = customers.nationality.astype('category')
countryKPI.country = countryKPI.country.astype('category')
print (countryKPI.country.cat.categories)
Index(['Austria', 'Germany'], dtype='object')
print (customers.nationality.cat.categories)
Index(['Austria', 'Slovakia'], dtype='object')
all_categories =countryKPI.country.cat.categories.union(customers.nationality.cat.categories)
print (all_categories)
Index(['Austria', 'Germany', 'Slovakia'], dtype='object')
customers.nationality = customers.nationality.cat.set_categories(all_categories)
countryKPI.country = countryKPI.country.cat.set_categories(all_categories)
,如果缺少某些类别,则会引发错误:
ValueError:分类concat中不兼容的类别
解决方案按union
然后set_categories查找常见类别:
df_pivoted = countryKPI.pivot_table(index='country',
columns='indicator',
values='value',
fill_value=0)
print (df_pivoted)
indicator x z
country
Austria 7 7
Germany 8 9
Slovakia 0 0
print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
.reset_index()
.rename(columns={'index':'nationality'})
[['customer','nationality','value','x','z']])
customer nationality value x z
0 second Austria 8.0 7 7
1 NaN Germany NaN 8 9
2 first Slovakia 7.0 0 0
df_pivoted1 = countryKPI.groupby(['country','indicator'])
.mean()
.squeeze()
.unstack()
.fillna(0)
print (df_pivoted1)
indicator x z
country
Austria 7.0 7.0
Germany 8.0 9.0
Slovakia 0.0 0.0
如果需要更好的效果,请pivot_table
使用groupby
:
In [177]: %timeit countryKPI.pivot_table(index='country', columns='indicator', values='value', fill_value=0)
100 loops, best of 3: 6.24 ms per loop
In [178]: %timeit countryKPI.groupby(['country','indicator']).mean().squeeze().unstack().fillna(0)
100 loops, best of 3: 4.28 ms per loop
<强>计时强>:
@Override
public void onCreate(@Nullable Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setHasOptionsMenu(true);
}
答案 1 :(得分:1)
您可以通过merge
来解决类别中的不匹配问题:
df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index.name = 'nationality'
customers.merge(df['value'].reset_index(), on='nationality', how='outer')
数据:
countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
'indicator':['z','x','z','x'],
'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
'nationality':['Slovakia','Austria'],
'value':[7,8]})
问题似乎是CategoricalIndex
因DF
操作而导致pivot
reset_index
,并且当您执行dtypes
时就会抱怨该错误。
只需执行逆向工程,就像查看countryKPI
和customers
数据框的category
以及提到string
的任何位置一样,将这些列转换为astype(str)
通过DF
重现错误并对其进行反击:
假设countryKPI['indicator'] = countryKPI['indicator'].astype('category')
countryKPI['country'] = countryKPI['country'].astype('category')
customers['nationality'] = customers['nationality'].astype('category')
countryKPI.dtypes
country category
indicator category
value int64
dtype: object
customers.dtypes
customer object
nationality category
value int64
dtype: object
是上面提到的:
pivot
df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index
CategoricalIndex(['Austria', 'Germany'], categories=['Austria', 'Germany'], ordered=False,
name='country', dtype='category')
# ^^ See the categorical index
操作后:
reset_index
当您执行df.reset_index()
时:
str
TypeError:无法将项目插入到不是的CategoricalIndex中 已经是现有的类别
要解决该错误,只需将分类列转换为countryKPI['indicator'] = countryKPI['indicator'].astype('str')
countryKPI['country'] = countryKPI['country'].astype('str')
customers['nationality'] = customers['nationality'].astype('str')
类型。
reset_index
现在,merge
部分工作,甚至adb shell getprop
也可以。