Question

我有一个数据框df，

df
       col1 act_id col2                                                                                                 
   --------------------
0  40;30;30   act1 A;B;C
1  25;50;25   act2 D;E;F
2     70;30   act3 G;H

我想以这样一种方式打破每条记录：列col1和col2中的值爆炸成多行，但使{{1} }}对应于col1分割后的';'中的第一个值。所以我的col2应该看起来像这样：

';'

注意：：这与Split (explode) pandas dataframe string entry to separate rows 不同，因为此处爆炸/拆分一条记录不仅跨越一列，而且还需要将一行拆分或分解成多行行，同时分为两列。

感谢您的帮助。谢谢

Answer 1

一种方法

df2.set_index('act_id').apply(lambda x: pd.Series(x.col1.split(';'),x.col2.split(';')), axis=1).stack().dropna().reset_index()

df2.columns = ['col1','act_id','col2']

  col1 act_id col2
0  A    act1   40 
1  B    act1   30 
2  C    act1   30 
3  D    act2   25 
4  E    act2   50 
5  F    act2   25 
6  G    act3   70 
7  H    act3   30

Answer 2

这个想法是col1和col2应该分解，然后在索引上合并并重新连接到原始数据框。

df1 = df.col1.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df2 = df.col2.str.split(";").apply(pd.Series).stack().droplevel(1).reset_index()
df12 = pd.merge(df1, df2[0], left_index=True, right_index=True)
df12.columns = ["index", "col1", "col2"]

pd.merge(df12, df["act_id"], left_on="index", right_index=True)

Answer 3

通用函数可以是：

list_cols = {'col1','col2'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
desired_df = pd.DataFrame(dict(zip(list_cols, exploded)))
desired_df = df[other_cols].merge(desired_df, how="right", left_index=True, right_index=True)

在调用上述函数之前，请先拆分col 1和col

同时将pandas dataframe分解成多行，跨多列的多行

3 个答案: