Question

我有以下数据框：

Customer ProductID Count

John     1         25
John     6         50
Mary     2         15
Mary     3         35

我希望我的输出看起来像这样：

Customer ProductID Count

John     1         25
John     2         0
John     3         0
John     6         50
Mary     1         0
Mary     2         15
Mary     3         35
Mary     6         0

我要做的是从数据框中识别唯一的ProductID

unique_ID =  pd.unique(df.ProductID.ravel())
print (unique_ID) = array([1,6,2,3])

由于客户John不存在ProductID 2,3，我将按客户名称拆分数据框

df1 = df[df['Customer']=='John']
df2 = df[df['Customer']=='Mary']

print df1

Customer  ProductID  Count
John      1          25
John      6          50

print df2

Customer  ProductID  Count
Mary      2          15
Mary      3          35

我想将ProductID 2,3添加到John并ProductID 1,6添加到Mary，并将Count设置为0，这些ProductID如我所希望的那样以上输出。

Answer 1

我认为您可以使用pivot - NaN的{{1}}值为0，df最后需要print (df.pivot(index='Customer',columns='ProductID', values='Count') .fillna(0) .stack() .reset_index(name='Count')) Customer ProductID Count 0 John 1 25.0 1 John 2 0.0 2 John 3 0.0 3 John 6 50.0 4 Mary 1 0.0 5 Mary 2 15.0 6 Mary 3 35.0 7 Mary 6 0.0的原始形状 - 使用{{ 3}}与fillna：

ProductID

另一种解决方案 - 首先获取列df列stack的值reset_index，然后创建unique和sort_values Multiindex这个a = df.Customer.unique() b = df.ProductID.sort_values().unique() print (a) ['John' 'Mary'] print (b) [1 2 3 6] m = pd.MultiIndex.from_product([a,b]) print (m) MultiIndex(levels=[['John', 'Mary'], [1, 2, 3, 6]], labels=[[0, 0, 0, 0, 1, 1, 1, 1], [0, 1, 2, 3, 0, 1, 2, 3]]) df1 = df.set_index(['Customer','ProductID']).reindex(m, fill_value=0).reset_index() df1.columns = ['Customer','ProductID','Count'] print (df1) Customer ProductID Count 0 John 1 25 1 John 2 0 2 John 3 0 3 John 6 50 4 Mary 1 0 5 Mary 2 15 6 Mary 3 35 7 Mary 6 0：

{{1}}

使用Pandas

1 个答案: