我希望使用Python将数据帧df1转换为df2。我有一个使用循环的解决方案,但我想知道是否有更简单的方法来创建df2。
df1
Test1 Test2 2014 2015 2016 Present
1 x a 90 85 84 0
2 x a:b 88 79 72 1
3 y a:b:c 75 76 81 0
4 y b 60 62 66 0
5 y c 68 62 66 1
df2
Test1 Test2 2014 2015 2016 Present
1 x a 90 85 84 0
2 x a 88 79 72 1
3 x b 88 79 72 1
4 y a 75 76 81 0
5 y b 75 76 81 0
6 y c 75 76 81 0
7 y b 60 62 66 0
8 y c 68 62 66 1
答案 0 :(得分:1)
这是使用numpy.repeat
和itertools.chain
的一种方法:
import numpy as np
from itertools import chain
# split by delimiter and calculate length for each row
split = df['Test2'].str.split(':')
lens = split.map(len)
# repeat non-split columns
cols = ('Test1', '2014', '2015', '2016', 'Present')
d1 = {col: np.repeat(df[col], lens) for col in cols}
# chain split columns
d2 = {'Test2': list(chain.from_iterable(split))}
# combine in a single dataframe
res = pd.DataFrame({**d1, **d2})
print(res)
2014 2015 2016 Present Test1 Test2
1 90 85 84 0 x a
2 88 79 72 1 x a
2 88 79 72 1 x b
3 75 76 81 0 y a
3 75 76 81 0 y b
3 75 76 81 0 y c
4 60 62 66 0 y b
5 68 62 66 1 y c
答案 1 :(得分:0)
这将实现您想要的:
# Converting "Test2" strings into lists of values
df["Test2"] = df["Test2"].apply(lambda x: x.split(":"))
# Creating second dataframe with "Test2" values
test2 = df.apply(lambda x: pd.Series(x['Test2']),axis=1).stack().reset_index(level=1, drop=True)
test2.name = 'Test2'
# Joining both dataframes
df = df.drop('Test2', axis=1).join(test2)
print(df)
Test1 2014 2015 2016 Present Test2
1 x 90 85 84 0 a
2 x 88 79 72 1 a
2 x 88 79 72 1 b
3 y 75 76 81 0 a
3 y 75 76 81 0 b
3 y 75 76 81 0 c
4 y 60 62 66 0 b
5 y 68 62 66 1 c