Question

如何有效地使用pandas为每个客户附加多个KPI值？

pivoted df与customers df的连接会产生一些问题，因为该国家/地区是数据框架的索引，且国籍不在索引中。

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Germany','Austria'],
                           'value':[7,8]})

以粉色显示所需结果：

Answer 1

我认为您可以使用concat：

df_pivoted = countryKPI.pivot_table(index='country', 
                              columns='indicator', 
                              values='value', 
                              fill_value=0)
print (df_pivoted)    
indicator  x  z
country        
Austria    7  7
Germany    8  9

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1))
        customer  value  x  z
Austria   second      8  7  7
Germany    first      7  8  9                       


print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
         .reset_index()
         .rename(columns={'index':'nationality'})
         [['customer','nationality','value','x','z']])

  customer nationality  value  x  z
0   second     Austria      8  7  7
1    first     Germany      7  8  9

通过评论编辑：

问题是dtypes列customers.nationality和countryKPI.country category是import pandas as pd import numpy as np countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'], 'indicator':['z','x','z','x'], 'value':[7,8,9,7]}) customers = pd.DataFrame({'customer':['first','second'], 'nationality':['Slovakia','Austria'], 'value':[7,8]}) customers.nationality = customers.nationality.astype('category') countryKPI.country = countryKPI.country.astype('category') print (countryKPI.country.cat.categories) Index(['Austria', 'Germany'], dtype='object') print (customers.nationality.cat.categories) Index(['Austria', 'Slovakia'], dtype='object') all_categories =countryKPI.country.cat.categories.union(customers.nationality.cat.categories) print (all_categories) Index(['Austria', 'Germany', 'Slovakia'], dtype='object') customers.nationality = customers.nationality.cat.set_categories(all_categories) countryKPI.country = countryKPI.country.cat.set_categories(all_categories)，如果缺少某些类别，则会引发错误：

ValueError：分类concat中不兼容的类别

解决方案按union然后set_categories查找常见类别：

df_pivoted = countryKPI.pivot_table(index='country', 
                              columns='indicator', 
                              values='value', 
                              fill_value=0)
print (df_pivoted)    
indicator  x  z
country        
Austria    7  7
Germany    8  9
Slovakia   0  0        

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
         .reset_index()
         .rename(columns={'index':'nationality'})
         [['customer','nationality','value','x','z']])

  customer nationality  value  x  z
0   second     Austria    8.0  7  7
1      NaN     Germany    NaN  8  9
2    first    Slovakia    7.0  0  0

df_pivoted1 = countryKPI.groupby(['country','indicator'])
                        .mean()
                        .squeeze()
                        .unstack()
                        .fillna(0)
print (df_pivoted1)
indicator    x    z
country            
Austria    7.0  7.0
Germany    8.0  9.0
Slovakia   0.0  0.0

如果需要更好的效果，请pivot_table使用groupby：

In [177]: %timeit countryKPI.pivot_table(index='country', columns='indicator', values='value', fill_value=0)
100 loops, best of 3: 6.24 ms per loop

In [178]: %timeit countryKPI.groupby(['country','indicator']).mean().squeeze().unstack().fillna(0)
100 loops, best of 3: 4.28 ms per loop

<强>计时：

   @Override
public void onCreate(@Nullable Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setHasOptionsMenu(true);
}

Answer 2

您可以通过merge来解决类别中的不匹配问题：

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index.name = 'nationality'    
customers.merge(df['value'].reset_index(), on='nationality', how='outer')

数据：

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Slovakia','Austria'],
                           'value':[7,8]})

问题似乎是CategoricalIndex因DF操作而导致pivot reset_index，并且当您执行dtypes时就会抱怨该错误。

只需执行逆向工程，就像查看countryKPI和customers数据框的category以及提到string的任何位置一样，将这些列转换为astype(str)通过DF

表示

重现错误并对其进行反击：

假设countryKPI['indicator'] = countryKPI['indicator'].astype('category') countryKPI['country'] = countryKPI['country'].astype('category') customers['nationality'] = customers['nationality'].astype('category') countryKPI.dtypes country category indicator category value int64 dtype: object customers.dtypes customer object nationality category value int64 dtype: object是上面提到的：

pivot

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator']) df.index CategoricalIndex(['Austria', 'Germany'], categories=['Austria', 'Germany'], ordered=False, name='country', dtype='category') # ^^ See the categorical index操作后：

reset_index

当您执行df.reset_index()时：

str

TypeError：无法将项目插入到不是的CategoricalIndex中已经是现有的类别

要解决该错误，只需将分类列转换为countryKPI['indicator'] = countryKPI['indicator'].astype('str') countryKPI['country'] = countryKPI['country'].astype('str') customers['nationality'] = customers['nationality'].astype('str')类型。

reset_index

现在，merge部分工作，甚至adb shell getprop也可以。

Pandas为一个列附加多个列

2 个答案: