Question

如果我有一个数据框，并且它的col1是文件名，那么col2是我想将其值转置为文件名的值，例如：

Input:
col1  col2
file1 text_0
file1 text_1
file1 text_2
file2 text_0
file2 text_1
file2 text_2
file2 text_3
file3 text_0

Output:
col1  col2   col3   col4   col3
file1 text_0 text_1 text_2 
file2 text_0 text_1 text_2 text_3
file3 text_0

Answer 1

第一个想法是使用GroupBy.cumcount来计数col1的重复值以表示新的列名，并用Series.unstack进行整形：

df = (df.set_index(['col1',df.groupby('col1').cumcount()])['col2']
        .unstack(fill_value='')
        .reset_index())
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

或创建列表的Series并避免使用apply(pd.Series)，because slow，最好使用DataFrame构造函数：

s = df.groupby('col1')['col2'].apply(list)
df = pd.DataFrame(s.tolist(), index=s.index).reset_index().fillna('')
df.columns = [f'col{x}' for x in range(1, len(df.columns) + 1)]
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

替代：

s = df.groupby('col1')['col2'].apply(list)

L = [[k] + v for k, v in s.items()]
df = pd.DataFrame(L).fillna('').rename(columns=lambda x: f'col{x+1}')
print (df)
    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2        
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0

Answer 2

似乎您有DataFrames，这意味着您正在使用Pandas。考虑根据您的实际需求检查pandas.transpose或pandas.pivot。

Answer 3

尝试一下：

new_df = df.pivot(columns='col1').droplevel(0,axis=1).rename_axis(columns='col1').apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.reset_index(drop=True).index+2
new_df = new_df.T.add_prefix('col_')

输出：

        col_2   col_3   col_4   col_5
col1                                 
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

或您拥有它的新方式：

new_df = df.pivot(columns='col1').droplevel(0,axis=1).apply(lambda x: pd.Series(x.dropna().values)).fillna('')
new_df.index = new_df.index+2
new_df = new_df.T.add_prefix('col_')
new_df = new_df.rename_axis(columns='col1', index=None)

输出：

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

Answer 4

由于OP不需要任何枢轴，因此这是一个无枢轴的解决方案：

df = df.groupby('col1')['col2'].agg(list).apply(pd.Series).fillna('')  
df.columns = list(range(2,6))
df = df.add_prefix('col_')
df = df.rename_axis(columns='col1', index=None)

输出：

col1    col_2   col_3   col_4   col_5
file1  text_0  text_1  text_2        
file2  text_0  text_1  text_2  text_3
file3  text_0

Answer 5

这应该可以解决问题：

df2=df.groupby("col1").agg(lambda x: (dict((f"col{id+2}",val) for id,val in enumerate(list(x)))))
df2=df2["col2"].apply(pd.Series).reset_index()

输出：

    col1    col2    col3    col4    col5
0  file1  text_0  text_1  text_2     NaN
1  file2  text_0  text_1  text_2  text_3
2  file3  text_0     NaN     NaN     NaN

如何在python中将列转换为多行以获取列值？

5 个答案: