熊猫:转置/处理数据框

时间:2020-01-22 14:51:33

标签: python pandas dataframe

假设我有一个数据框,如下所示。数据框描述了油漆的组成,因此可以通过以给定百分比混合特定颜色(子类型列)来描述任意命名的油漆( NAME 列)的组成(权重颜色)和特定颜色也可以通过广义父级(类型)进行区分。日期已包含在完整性检查中,但此处未使用。

|weight|NAME       |type  |subtype    |date      |
--------------------------------------------------
|93.35 |candyapple |red   |maroon     |2018-06-30|
|6.65  |candyapple |red   |crimson    |2018-06-30|
|93.41 |grannysmith|green |limegreen  |2010-03-31|
|1.78  |grannysmith|green |deepgreen  |2019-12-31|
|0.72  |grannysmith|yellow|goldyellow |2019-12-31|
|2.96  |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red   |maroon     |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen  |2020-02-29|
--------------------------------------------------

因此candyapple的完整构成是93.35% crimson6.65% maroon,它们都是红色的子类型。 grannysmith可以由上述子类型表示,但我们也可以将其称为95.19% green,即其绿色子类型的总和和{{1 }}和0.72% yellow。在绘画配置中,用于子类型和类型的名称是通用的,但并非所有配置都会列出所有子类型。 如果未列出子类型,则假定其为0.00%。例如,我们看到2.96% brown未列出任何candyapple-我们可以假定它为{ {1}}。

  1. 使用熊猫和python,如何处理此数据框以适合以下结构?
green

1a。使用大熊猫,我该如何转置,以使0.00% limegreen的值成为列标题,并且所有|NAME |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen| --------------------------------------------------------------------------------------------------- |candyapple |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 | |grannysmith|0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 | |awfulbrown |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 | --------------------------------------------------------------------------------------------------- 的值都排成一行?

1b。转置后,如何用subtype填充表格中的任何空白? (例如NAME0.00

  1. 此外,我如何使用熊猫创建类似的框架,但使用candyapple而不是0.00% limegreen?类型的权重是其子类型的权重之和。
type

2a。已经按照(1)进行了转换,但是这次使用subtype,我如何使用pandas / python对值求和,以使给定|NAME |red |green |yellow |brown | ---------------------------------------------- |candyapple |100.00|0.00 |0.00 |0.00 | |grannysmith|0.00 |95.19 |0.72 |2.96 | |awfulbrown |33.33 |33.33 |33.33 |0.00 | ---------------------------------------------- 的权重为其{{ 1}}?

  1. (已添加)我们可以将两者结合如下吗?
type

3a。熊猫是否有一种方法可以从原始数据集中创建上述type的总和和subtype的各个权重的组合DF?

2 个答案:

答案 0 :(得分:3)

在第一种情况下,pivot就足够了,因为不需要聚合:

df.pivot('NAME', 'subtype', 'weight').fillna(0)

subtype      crimson  deepgreen  goldyellow  lightbrown  limegreen  maroon  \
NAME                                                                         
awfulbrown      0.00      33.33        0.00        0.00       0.00   33.33   
candyapple      6.65       0.00        0.00        0.00       0.00   93.35   
grannysmith     0.00       1.78        0.72        2.96      93.41    0.00   

subtype      plainyellow  
NAME                      
awfulbrown         33.33  
candyapple          0.00  
grannysmith         0.00  

对于第二种情况,您可以使用pivot_table,并与sum进行聚合:

df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)

type         brown  green     red  yellow
NAME                                     
awfulbrown    0.00  33.33   33.33   33.33
candyapple    0.00   0.00  100.00    0.00
grannysmith   2.96  95.19    0.00    0.72

答案 1 :(得分:2)

使用pd.crosstab

subtypes = pd.crosstab(df.NAME,df.type,df.weight,aggfunc='sum')

types = pd.crosstab(df.NAME,df.subtype,df.weight,aggfunc='sum')

final = pd.concat([types,subtypes],axis=1)

1

print(subtypes)

subtype      crimson  deepgreen  goldyellow  lightbrown  limegreen  maroon  \
NAME                                                                         
awfulbrown       NaN      33.33         NaN         NaN        NaN   33.33   
candyapple      6.65        NaN         NaN         NaN        NaN   93.35   
grannysmith      NaN       1.78        0.72        2.96      93.41     NaN   

subtype      plainyellow  
NAME                      
awfulbrown         33.33  
candyapple           NaN  
grannysmith          NaN  

2

print(types)
type         brown   green   red     yellow
NAME                                       
awfulbrown      NaN   33.33   33.33   33.33
candyapple      NaN     NaN  100.00     NaN
grannysmith    2.96   95.19     NaN    0.72

3

print(final.fillna(0))

             brown   green   red     yellow  crimson  deepgreen  goldyellow  \
NAME                                                                          
awfulbrown     0.00   33.33   33.33   33.33     0.00      33.33        0.00   
candyapple     0.00    0.00  100.00    0.00     6.65       0.00        0.00   
grannysmith    2.96   95.19    0.00    0.72     0.00       1.78        0.72   

             lightbrown  limegreen  maroon  plainyellow  
NAME                                                     
awfulbrown         0.00       0.00   33.33        33.33  
candyapple         0.00       0.00   93.35         0.00  
grannysmith        2.96      93.41    0.00         0.00