继this question之后,是否可以进行类似的“拓宽”活动。在pandas中操作,其中每个'实体有多个源列?
如果我的数据现在如下:
Box,Code,Category
Green,1221,Active
Green,8391,Inactive
Red,3709,Inactive
Red,2911,Pending
Blue,9820,Active
Blue,4530,Active
我如何最有效地前往:
Box,Code0,Category0,Code1,Category1
Green,1221,Active,8391,Inactive
Red,3709,Inactive,2911,Pending
Blue,9820,Active,4530,Active
到目前为止,我能够将“#”工作的唯一解决方案是按照链接页面中的示例进行操作,并创建两个单独的DataFrame,一个按Box和Code分组,另一个按Box和Category分组,然后按Box将两者合并。
a = get_clip.groupby('Box')['Code'].apply(list)
b = get_clip.groupby('Box')['Category'].apply(list)
broadeneda = pd.DataFrame(a.values.tolist(), index = a.index).add_prefix('Code').reset_index()
broadenedb = pd.DataFrame(b.values.tolist(), index = b.index).add_prefix('Category').reset_index()
merged = pd.merge(broadeneda, broadenedb, on='Box', how = 'inner')
有没有办法实现这一目标而不分别扩大每一列并在最后合并?
答案 0 :(得分:3)
gourpby
+ cumcount
+ unstack
df1=df.assign(n=df.groupby('Box').cumcount()).set_index(['Box','n']).unstack(1)
df1.columns=df1.columns.map('{0[0]}{0[1]}'.format)
df1
Out[141]:
Code0 Code1 Category0 Category1
Box
Blue 9820 4530 Active Active
Green 1221 8391 Active Inactive
Red 3709 2911 Inactive Pending
答案 1 :(得分:2)
选项1
使用set_index
,pipe
和set_axis
df.set_index(['Box', df.groupby('Box').cumcount()]).unstack().pipe(
lambda d: d.set_axis(d.columns.map('{0[0]}{0[1]}'.format), 1, False)
)
Code0 Code1 Category0 Category1
Box
Blue 9820 4530 Active Active
Green 1221 8391 Active Inactive
Red 3709 2911 Inactive Pending
选项2
使用defaultdict
from collections import defaultdict
d = defaultdict(dict)
for a, *b in df.values:
i = len(d[a]) // len(b)
c = (f'Code{i}', f'Category{i}')
d[a].update(dict(zip(c, b)))
pd.DataFrame.from_dict(d, 'index').rename_axis('Box')
Code0 Category0 Code1 Category1
Box
Blue 9820 Active 4530 Active
Green 1221 Active 8391 Inactive
Red 3709 Inactive 2911 Pending
答案 2 :(得分:0)
这可以通过迭代子数据帧来完成:
cols = ["Box","Code0","Category0","Code1","Category1"]
newdf = pd.DataFrame(columns = cols) # create an empty dataframe to be filled
for box in pd.unique(df.Box): # for each color in Box
subdf = df[df.Box == box] # get a sub-dataframe
newrow = subdf.values[0].tolist() # get its values and then its full first row
newrow.extend(subdf.values[1].tolist()[1:3]) # add second and third entries of second row
newdf = pd.concat([newdf, pd.DataFrame(data=[newrow], columns=cols)], axis=0) # add to new dataframe
print(newdf)
输出:
Box Code0 Category0 Code1 Category1
0 Green 1221.0 Active 8391.0 Inactive
0 Red 3709.0 Inactive 2911.0 Pending
0 Blue 9820.0 Active 4530.0 Active
答案 3 :(得分:0)
似乎相同的颜色会连续出现而每种颜色都有相同的行。(两个重要的假设。)因此,我们可以将df分成奇数部分df[::2]
和偶数部分df[1::2]
,然后将它们合并在一起。
pd.merge(df[::2], df[1::2], on="Box")
Box Code_x Category_x Code_y Category_y
0 Green 1221 Active 8391 Inactive
1 Red 3709 Inactive 2911 Pending
2 Blue 9820 Active 4530 Active
可以通过重置列来轻松地重命名。