Pandas为一个列附加多个列

时间:2016-09-22 08:46:58

标签: python pandas

如何有效地使用pandas为每个客户附加多个KPI值?

pivoted df与customers df的连接会产生一些问题,因为该国家/地区是数据框架的索引,且国籍不在索引中。

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Germany','Austria'],
                           'value':[7,8]})

以粉色显示所需结果: enter image description here

2 个答案:

答案 0 :(得分:2)

我认为您可以使用concat

df_pivoted = countryKPI.pivot_table(index='country', 
                              columns='indicator', 
                              values='value', 
                              fill_value=0)
print (df_pivoted)    
indicator  x  z
country        
Austria    7  7
Germany    8  9

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1))
        customer  value  x  z
Austria   second      8  7  7
Germany    first      7  8  9                       


print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
         .reset_index()
         .rename(columns={'index':'nationality'})
         [['customer','nationality','value','x','z']])

  customer nationality  value  x  z
0   second     Austria      8  7  7
1    first     Germany      7  8  9

通过评论编辑:

问题是dtypescustomers.nationalitycountryKPI.country categoryimport pandas as pd import numpy as np countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'], 'indicator':['z','x','z','x'], 'value':[7,8,9,7]}) customers = pd.DataFrame({'customer':['first','second'], 'nationality':['Slovakia','Austria'], 'value':[7,8]}) customers.nationality = customers.nationality.astype('category') countryKPI.country = countryKPI.country.astype('category') print (countryKPI.country.cat.categories) Index(['Austria', 'Germany'], dtype='object') print (customers.nationality.cat.categories) Index(['Austria', 'Slovakia'], dtype='object') all_categories =countryKPI.country.cat.categories.union(customers.nationality.cat.categories) print (all_categories) Index(['Austria', 'Germany', 'Slovakia'], dtype='object') customers.nationality = customers.nationality.cat.set_categories(all_categories) countryKPI.country = countryKPI.country.cat.set_categories(all_categories) ,如果缺少某些类别,则会引发错误:

  

ValueError:分类concat中不兼容的类别

解决方案按union然后set_categories查找常见类别:

df_pivoted = countryKPI.pivot_table(index='country', 
                              columns='indicator', 
                              values='value', 
                              fill_value=0)
print (df_pivoted)    
indicator  x  z
country        
Austria    7  7
Germany    8  9
Slovakia   0  0        

print (pd.concat([customers.set_index('nationality'), df_pivoted], axis=1)
         .reset_index()
         .rename(columns={'index':'nationality'})
         [['customer','nationality','value','x','z']])

  customer nationality  value  x  z
0   second     Austria    8.0  7  7
1      NaN     Germany    NaN  8  9
2    first    Slovakia    7.0  0  0
df_pivoted1 = countryKPI.groupby(['country','indicator'])
                        .mean()
                        .squeeze()
                        .unstack()
                        .fillna(0)
print (df_pivoted1)
indicator    x    z
country            
Austria    7.0  7.0
Germany    8.0  9.0
Slovakia   0.0  0.0

如果需要更好的效果,请pivot_table使用groupby

In [177]: %timeit countryKPI.pivot_table(index='country', columns='indicator', values='value', fill_value=0)
100 loops, best of 3: 6.24 ms per loop

In [178]: %timeit countryKPI.groupby(['country','indicator']).mean().squeeze().unstack().fillna(0)
100 loops, best of 3: 4.28 ms per loop

<强>计时

   @Override
public void onCreate(@Nullable Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setHasOptionsMenu(true);
}

答案 1 :(得分:1)

您可以通过merge来解决类别中的不匹配问题:

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator'])
df.index.name = 'nationality'    
customers.merge(df['value'].reset_index(), on='nationality', how='outer')

Image

数据:

countryKPI = pd.DataFrame({'country':['Austria','Germany', 'Germany', 'Austria'],
                           'indicator':['z','x','z','x'],
                           'value':[7,8,9,7]})
customers = pd.DataFrame({'customer':['first','second'],
                           'nationality':['Slovakia','Austria'],
                           'value':[7,8]})

问题似乎是CategoricalIndexDF操作而导致pivot reset_index,并且当您执行dtypes时就会抱怨该错误。

只需执行逆向工程,就像查看countryKPIcustomers数据框的category以及提到string的任何位置一样,将这些列转换为astype(str)通过DF

表示

重现错误并对其进行反击:

假设countryKPI['indicator'] = countryKPI['indicator'].astype('category') countryKPI['country'] = countryKPI['country'].astype('category') customers['nationality'] = customers['nationality'].astype('category') countryKPI.dtypes country category indicator category value int64 dtype: object customers.dtypes customer object nationality category value int64 dtype: object 是上面提到的:

pivot

df = pd.pivot_table(data=countryKPI, index=['country'], columns=['indicator']) df.index CategoricalIndex(['Austria', 'Germany'], categories=['Austria', 'Germany'], ordered=False, name='country', dtype='category') # ^^ See the categorical index 操作后:

reset_index

当您执行df.reset_index() 时:

str
  

TypeError:无法将项目插入到不是的CategoricalIndex中   已经是现有的类别

要解决该错误,只需将分类列转换为countryKPI['indicator'] = countryKPI['indicator'].astype('str') countryKPI['country'] = countryKPI['country'].astype('str') customers['nationality'] = customers['nationality'].astype('str') 类型。

reset_index

现在,merge部分工作,甚至adb shell getprop 也可以。