将列值分成一对一的映射

时间:2018-07-16 05:55:56

标签: python pandas dataframe

以下问题的扩展: Split (explode) pandas dataframe string entry to separate rows

df:

    STATE CITY  ALT_NAMES
0   S1    C1    A1@A2
1   S2    C2    A3@A4@A5 

如何获得以下结果:

out_df:

    STATE  CITY  CITY_VAR
0   S1     C1    A1
1   S1     C1    A2
2   S2     C2    A3
3   S2     C2    A4
4   S2     C2    A5

样本数据:

    STATE CITY            ALT_NAMES
    FL    FT. MYERS       FORT MYERS@FT MYERS
    FL    NORTH FT MYERS  N.FT.MYERS@N. FORT MYERS@NORTH FORT MYERS

2 个答案:

答案 0 :(得分:2)

为我工作:

df = explode(df.assign(ALT_NAMES=df.ALT_NAMES.str.split('@')), 'ALT_NAMES')
print (df)
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5

另一种纯熊猫解决方案:

df = (df.join(df.pop('ALT_NAMES')
                .str.split('@', expand=True)
                .stack()
                .reset_index(level=1, drop=True)
                .rename('ALT_NAMES'))
        .reset_index(drop=True ))
print (df)
  STATE            CITY         ALT_NAMES
0    FL       FT. MYERS        FORT MYERS
1    FL       FT. MYERS          FT MYERS
2    FL  NORTH FT MYERS        N.FT.MYERS
3    FL  NORTH FT MYERS     N. FORT MYERS
4    FL  NORTH FT MYERS  NORTH FORT MYERS

答案 1 :(得分:1)

这是针对您的数据的优化版本。

from itertools import chain
v = df.pop('ALT_NAMES').str.split('@')  

df = pd.DataFrame(
    df.values.repeat(v.str.len(), axis=0), columns=df.columns)
df['ALT_NAMES'] = list(chain.from_iterable(v))

df
  STATE CITY ALT_NAMES
0    S1   C1        A1
1    S1   C1        A2
2    S2   C2        A3
3    S2   C2        A4
4    S2   C2        A5