这里我有一个包含交易的数据集。每个交易可以具有1个以上的不同值-“维度”。每个交易的值不能相同。我想创建一个在列和行中都带有“维度”的数据框,并计算一次交易中一个维度与另一个维度一起使用的次数。
这是我尝试过的
dim_set = [ (1, 'Customer group$Large'),
(1, 'DEPARTMENT$Sales'),
(2, 'Customer group$Medium'),
(2, 'DEPARTMENT$Sales'),
(3, 'DEPARTMENT$Sales'),
(4, 'Customer group$Small'),
(4, 'DEPARTMENT$Sales')
]
df = pd.DataFrame(dim_set, columns=['combination_id', 'dimension'])
df
df_st_1 = df.pivot_table(index = 'dimension', columns = 'dimension',values = 'combination_id', aggfunc = 'count')
df_st_1
预期结果应该是这样
dim_set = [ ('Customer group$Large', 1, 1, 0, 0),
('DEPARTMENT$Sales', 1, 4, 1, 1),
('Customer group$Medium', 0, 1, 1, 0),
('Customer group$Small', 0, 1, 0, 1)
]
df = pd.DataFrame(dim_set, columns=['dimension','Customer group$Large', 'DEPARTMENT$Sales', 'Customer group$Medium', 'Customer group$Small'])
df
答案 0 :(得分:1)
将DataFrame.merge
与crosstab
一起使用,最后通过DataFrame.reset_index
和DataFrame.rename_axis
清除一些数据:
df1 = df.merge(df, on='combination_id', suffixes=('','_'))
df1 = (pd.crosstab(df1['dimension'], df1['dimension_'])
.reset_index()
.rename_axis(None)
.rename_axis(None, axis=1))
print (df1)
dimension Customer group$Large Customer group$Medium \
0 Customer group$Large 1 0
1 Customer group$Medium 0 1
2 Customer group$Small 0 0
3 DEPARTMENT$Sales 1 1
Customer group$Small DEPARTMENT$Sales
0 0 1
1 0 1
2 1 1
3 1 4