我需要使用现有列将列创建为字典。 df:
Period Category Sub-Category
FY18Q1 Footwear Shoes
FY18Q2 Footwear Sandal
FY18Q1 Footwear Shoes
FY18Q3 Footwear Boots
FY18Q1 Clothing Shirt
FY18Q2 Clothing Trouser
FY18Q1 Clothing Shirt
FY18Q3 Clothing Shirt
我想根据类别级别创建两个新列。 A. 类别级别的子类别计数。 B. 基于最近期的子类别。
Period Category Sub-Category freq Latest_freq
FY18Q1 Footwear Shoes {shoes:2,Sandal:1,Boots:1} Boots(FY18Q3)
FY18Q2 Footwear Sandal {shoes:2,Sandal:1,Boots:1} Boots(FY18Q3)
FY18Q1 Footwear Shoes {shoes:2,Sandal:1,Boots:1} Boots(FY18Q3)
FY18Q3 Footwear Boots {shoes:2,Sandal:1,Boots:1} Boots(FY18Q3)
FY18Q1 Clothing Shirt {Shirt:3,Trouser:1} Shirt(FY18Q3)
FY18Q2 Clothing Trouser {Shirt:3,Trouser:1} Shirt(FY18Q3)
FY18Q1 Clothing Shirt {Shirt:3,Trouser:1} Shirt(FY18Q3)
FY18Q3 Clothing Shirt {Shirt:3,Trouser:1} Shirt(FY18Q3)
答案 0 :(得分:3)
对带有 Series.value_counts
和 to_dict
的 lambda 函数中每组的两个新值使用命名聚合,第二列首先由 DataFrame.assign
中的 ()
更改,然后由 {{ 聚合3}},在最后一步使用 GroupBy.last
:
df1=(df.assign(new = df['Sub-Category'] + '(' + df.Period + ')')
.groupby('Category').agg(freq=('Sub-Category',lambda x : x.value_counts().to_dict()),
lastest_freq=('new','last')))
print (df1)
freq lastest_freq
Category
Clothing {'Shirt': 3, 'Trouser': 1} Shirt(FY18Q3)
Footwear {'Shoes': 2, 'Boots': 1, 'Sandal': 1} Boots(FY18Q3)
df = df.join(df1, on='Category')
print (df)
Period Category Sub-Category freq \
0 FY18Q1 Footwear Shoes {'Shoes': 2, 'Boots': 1, 'Sandal': 1}
1 FY18Q2 Footwear Sandal {'Shoes': 2, 'Boots': 1, 'Sandal': 1}
2 FY18Q1 Footwear Shoes {'Shoes': 2, 'Boots': 1, 'Sandal': 1}
3 FY18Q3 Footwear Boots {'Shoes': 2, 'Boots': 1, 'Sandal': 1}
4 FY18Q1 Clothing Shirt {'Shirt': 3, 'Trouser': 1}
5 FY18Q2 Clothing Trouser {'Shirt': 3, 'Trouser': 1}
6 FY18Q1 Clothing Shirt {'Shirt': 3, 'Trouser': 1}
7 FY18Q3 Clothing Shirt {'Shirt': 3, 'Trouser': 1}
lastest_freq
0 Boots(FY18Q3)
1 Boots(FY18Q3)
2 Boots(FY18Q3)
3 Boots(FY18Q3)
4 Shirt(FY18Q3)
5 Shirt(FY18Q3)
6 Shirt(FY18Q3)
7 Shirt(FY18Q3)