我有这种类型的pandas DataFrame
col1 col2 col3
1 [blue] [in,out]
2 [green, green] [in]
3 [green] [in]
,我需要将其转换为保留第一列的数据框,并将所有其他值作为行分布在列中:
col1 value
1 blue
1 in
1 out
2 green
2 green
2 in
3 green
3 in
答案 0 :(得分:1)
将DataFrame.stack
与Series.explode
一起用于转换列表,最后使用DataFrame.reset_index
进行一些数据清理:
df1 = (df.set_index('col1')
.stack()
.explode()
.reset_index(level=1, drop=True)
.reset_index(name='value'))
替代DataFrame.melt
和DataFrame.explode
:
df1 = (df.melt('col1')
.explode('value')
.sort_values('col1')[['col1','value']]
.reset_index(drop=True)
)
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
或列表理解解决方案:
L = [(k, x) for k, v in df.set_index('col1').to_dict('index').items()
for k1, v1 in v.items()
for x in v1]
df1 = pd.DataFrame(L, columns=['col1','value'])
print (df1)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
答案 1 :(得分:0)
另一种解决方案可以包括:
col1
具有新值和df['col2']
和df['col3']
中的值的列表串联,以创建value
列。 代码如下:
df_final = pd.DataFrame(
{
'col1': [
i for i, sublist in zip(df['col1'], (df['col2'] + df['col3']).values)
for val in range(len(sublist))
],
'value': sum((df['col2'] + df['col3']).values, [])
}
)
print(df_final)
col1 value
0 1 blue
1 1 in
2 1 out
3 2 green
4 2 green
5 2 in
6 3 green
7 3 in
答案 2 :(得分:0)
d = []
c = []
for i in range(len(df)):
d.append([j for j in df['c2'][i]])
d.append([j for j in df['c3'][i]])
c.append(str(df['c1'][i]) * (len(df['c2'][i])+ len(df['c3'][i])))
c = [list(j) for j in c]
d = [i for sublist in d for i in sublist]
c = [i for sublist in d for i in sublist]
df1 = pd.DataFrame()
df1['c1'] = c
df1['c2'] = d
df = df1