假设我有一个数据框,如下所示。数据框描述了油漆的组成,因此可以通过以给定百分比混合特定颜色(子类型列)来描述任意命名的油漆( NAME 列)的组成(权重颜色)和特定颜色也可以通过广义父级(类型)进行区分。日期已包含在完整性检查中,但此处未使用。
|weight|NAME |type |subtype |date |
--------------------------------------------------
|93.35 |candyapple |red |maroon |2018-06-30|
|6.65 |candyapple |red |crimson |2018-06-30|
|93.41 |grannysmith|green |limegreen |2010-03-31|
|1.78 |grannysmith|green |deepgreen |2019-12-31|
|0.72 |grannysmith|yellow|goldyellow |2019-12-31|
|2.96 |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red |maroon |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen |2020-02-29|
--------------------------------------------------
因此candyapple
的完整构成是93.35% crimson
和6.65% maroon
,它们都是红色的子类型。 grannysmith
可以由上述子类型表示,但我们也可以将其称为95.19% green
,即其绿色子类型的总和和{{1 }}和0.72% yellow
。在绘画配置中,用于子类型和类型的名称是通用的,但并非所有配置都会列出所有子类型。 如果未列出子类型,则假定其为0.00%。例如,我们看到2.96% brown
未列出任何candyapple
-我们可以假定它为{ {1}}。
green
1a。使用大熊猫,我该如何转置,以使0.00% limegreen
的值成为列标题,并且所有|NAME |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |
|grannysmith|0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 |
|awfulbrown |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 |
---------------------------------------------------------------------------------------------------
的值都排成一行?
1b。转置后,如何用subtype
填充表格中的任何空白? (例如NAME
是0.00
)
candyapple
而不是0.00% limegreen
?类型的权重是其子类型的权重之和。 type
2a。已经按照(1)进行了转换,但是这次使用subtype
,我如何使用pandas / python对值求和,以使给定|NAME |red |green |yellow |brown |
----------------------------------------------
|candyapple |100.00|0.00 |0.00 |0.00 |
|grannysmith|0.00 |95.19 |0.72 |2.96 |
|awfulbrown |33.33 |33.33 |33.33 |0.00 |
----------------------------------------------
的权重为其{{ 1}}?
type
3a。熊猫是否有一种方法可以从原始数据集中创建上述type
的总和和subtype
的各个权重的组合DF?
答案 0 :(得分:3)
在第一种情况下,pivot
就足够了,因为不需要聚合:
df.pivot('NAME', 'subtype', 'weight').fillna(0)
subtype crimson deepgreen goldyellow lightbrown limegreen maroon \
NAME
awfulbrown 0.00 33.33 0.00 0.00 0.00 33.33
candyapple 6.65 0.00 0.00 0.00 0.00 93.35
grannysmith 0.00 1.78 0.72 2.96 93.41 0.00
subtype plainyellow
NAME
awfulbrown 33.33
candyapple 0.00
grannysmith 0.00
对于第二种情况,您可以使用pivot_table
,并与sum
进行聚合:
df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)
type brown green red yellow
NAME
awfulbrown 0.00 33.33 33.33 33.33
candyapple 0.00 0.00 100.00 0.00
grannysmith 2.96 95.19 0.00 0.72
答案 1 :(得分:2)
使用pd.crosstab
subtypes = pd.crosstab(df.NAME,df.type,df.weight,aggfunc='sum')
types = pd.crosstab(df.NAME,df.subtype,df.weight,aggfunc='sum')
final = pd.concat([types,subtypes],axis=1)
print(subtypes)
subtype crimson deepgreen goldyellow lightbrown limegreen maroon \
NAME
awfulbrown NaN 33.33 NaN NaN NaN 33.33
candyapple 6.65 NaN NaN NaN NaN 93.35
grannysmith NaN 1.78 0.72 2.96 93.41 NaN
subtype plainyellow
NAME
awfulbrown 33.33
candyapple NaN
grannysmith NaN
print(types)
type brown green red yellow
NAME
awfulbrown NaN 33.33 33.33 33.33
candyapple NaN NaN 100.00 NaN
grannysmith 2.96 95.19 NaN 0.72
print(final.fillna(0))
brown green red yellow crimson deepgreen goldyellow \
NAME
awfulbrown 0.00 33.33 33.33 33.33 0.00 33.33 0.00
candyapple 0.00 0.00 100.00 0.00 6.65 0.00 0.00
grannysmith 2.96 95.19 0.00 0.72 0.00 1.78 0.72
lightbrown limegreen maroon plainyellow
NAME
awfulbrown 0.00 0.00 33.33 33.33
candyapple 0.00 0.00 93.35 0.00
grannysmith 2.96 93.41 0.00 0.00