我正在尝试将pandas数据框中的多行单元格转换为表中的多行
column1 column2 column3 0 1 Hi hello 1 2 some\nTest\nTo\nWork hi 2 3 Hiya somewhere 3 4 test 4 5 Another test2 5 6 test3
鉴于上表,我希望我的输出如下所示:
column1 column2 column3 0 1 Hi hello 1 2 some hi 2 2 Test hi 3 2 To hi 4 2 work hi 5 3 Hiya somewhere 6 4 test 7 5 Another test2 8 6 test3
答案 0 :(得分:2)
在换行符和“嵌套”上分开:
from itertools import chain
v = df.pop('column2').str.split('\n') # if this doesn't work, try r'\\n'.
df = (pd.DataFrame(df.values.repeat(v.str.len(),axis=0), columns=df.columns)
.assign(column2=list(chain.from_iterable(v)))
.sort_index(axis=1))
print(df)
column1 column2 column3
0 1 Hi hello
1 2 some hi
2 2 Test hi
3 2 To hi
4 2 Work hi
5 3 Hiya somewhere
答案 1 :(得分:1)
尝试一下:
df.fillna('').set_index(['column1','column3']).stack().str.split('\n', expand=True).stack().unstack(-2).reset_index(-1, drop=True).reset_index()
Out[1516]:
column1 column3 column2
0 1 hello Hi
1 2 hi some
2 2 hi test
3 2 hi To
4 2 hi Work
5 3 somewhere Hiya
答案 2 :(得分:0)
import numpy as np
df[['column1','column3']]=np.repeat(df[['column1','column3']].values(df['column2'].str.split('\n').str.len()),axis=0)
df['column2']=','.join(df['column2'].values.flatten()).split()
print(df)
column1 column2 column3
0 1 Hi hello
1 2 some hi
2 2 Test hi
3 2 To hi
4 2 Work hi
5 3 Hiya somewhere