在熊猫中用分隔符分割时如何命名新列

时间:2019-07-31 22:32:13

标签: python pandas

我有这个数据框df2

tree    cues    directions  thresholds  exits
1   1   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;0;1;0.5
2   2   PLC2hrOGTT;Age;BMI  >;>;>   126;29;29.7 0;1;0.5
3   3   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;0;0;0.5
4   4   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;1;0;0.5
5   5   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   0;1;0;0.5
6   6   PLC2hrOGTT;Age;BMI  >;>;>   126;29;29.7 0;0;0.5
7   7   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   1;1;1;0.5
8   8   PLC2hrOGTT;Age;BMI;TimesPregnant    >;>;>;> 126;29;29.7;6   0;0;0;0.5

,我想将[“线索”,“出口”,“方向”,“阈值”]分别分成4列。

所以,我这样做是这样的:

df3 = df2['cues'].str.split(';',expand=True)
df4 = df2['directions'].str.split(';',expand=True)
df5 = df2['thresholds'].str.split(';',expand=True)
df6 = df2['exits'].str.split(';',expand=True)

# Concatinated these dataframes into one. 
df = pd.concat([df2, df3, df4, df5, df6], axis = 1)
df = df.reset_index(drop=True)

# I drop the initial columns that I don't need anymore
df.drop(columns =['tree','cues', 'directions', 'thresholds', 'exits'], inplace = True) 

df

    0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3
0   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   1   0   1   0.5
1   PLC2hrOGTT  Age BMI None    >   >   >   None    126 29  29.7    None    0   1   0.5 None
2   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   1   0   0   0.5
3   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   1   1   0   0.5
4   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   0   1   0   0.5
5   PLC2hrOGTT  Age BMI None    >   >   >   None    126 29  29.7    None    0   0   0.5 None
6   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   1   1   1   0.5
7   PLC2hrOGTT  Age BMI TimesPregnant   >   >   >   >   126 29  29.7    6   0   0   0   0.5

您现在可以看到,我剩下的不是唯一的DataFrame列。因此,我想问一个问题:分割时如何给列命名,以便在将它们串联成一个数据帧时在末尾有唯一的列?

1 个答案:

答案 0 :(得分:0)

代替

df3 = df2['cues'].str.split(';',expand=True)

尝试

df2[['cues1','cues2','cues3','cues4']] = df2['cues'].str.split(';',expand=True)
df2 = df2.drop('cues', 1)

对其他人遵循相同的方法。如果要复制原始DataFrame,请使用df2_cp = df2.copy()