python熊猫在合并列上融化

时间:2018-07-25 18:25:27

标签: pandas melt

我有一个这样的数据框。我有常规字段,直到“状态”,然后我将有预告片(3列tr1 *表示1个尾部),我想将这些预告片转换为行。我尝试了融化功能,但我只能使用1个拖车柱。请看下面的示例,您可以了解

Name number city       state    tr1num   tr1acct   tr1ct  tr2num  tr2acct   tr2ct   tr3num   tr3acct  tr3ct 
DJ   10     Edison     nj       1001     20345     Dew    1002    20346     Newca.  1003.    20347.   pen 
ND   20     Newark     DE       2001     1985      flor   2002    1986      rodge

我期望这样的输出。

Name number city       state    trnum   tracct     trct
DJ   10     Edison     nj       1001     20345     Dew   
DJ   10     Edison     nj       1002     20346     Newca
DJ   10     Edison     nj       1003     20347     pen
ND   20     Newark     DE       2001     1985      flor
ND   20     Newark     DE       2002     1986      rodge

3 个答案:

答案 0 :(得分:0)

您需要使用pd.wide_to_long。但是,您将需要先进行一些列重命名。

df = df.set_index(['Name','number','city','state'])
df.columns = df.columns.str.replace('(\D+)(\d+)(\D+)',r'\1\3_\2')
df = df.reset_index()

pd.wide_to_long(df, ['trnum','trct','tracct'], 
                ['Name','number','city','state'], 'Code',sep='_',suffix='\d+')\
  .reset_index()\
  .drop('Code',axis=1)

输出:

  Name  number    city state   trnum    trct   tracct
0   DJ      10  Edison    nj  1001.0     Dew  20345.0
1   DJ      10  Edison    nj  1002.0  Newca.  20346.0
2   DJ      10  Edison    nj  1003.0     pen  20347.0
3   ND      20  Newark    DE  2001.0    flor   1985.0
4   ND      20  Newark    DE  2002.0   rodge   1986.0
5   ND      20  Newark    DE     NaN     NaN      NaN

答案 1 :(得分:0)

您可以通过重命名列和位并应用pandas wide_to_long方法来实现。以下是产生所需输出的代码。

df = pd.DataFrame({"Name":["DJ", "ND"], "number":[10,20], "city":["Edison", "Newark"], "state":["nj","DE"],
                  "trnum_1":[1001,2001], "tracct_1":[20345,1985], "trct_1":["Dew", "flor"], "trnum_2":[1002,2002],
                  "trct_2":["Newca", "rodge"], "trnum_3":[1003,None], "tracct_3":[20347,None], "trct_3":["pen", None]})

pd.wide_to_long(df, stubnames=['trnum', 'tracct', 'trct'], i='Name', j='dropme', sep='_').reset_index().drop('dropme', axis=1)\
  .sort_values('trnum')

输出

  Name state city number trnum  tracct  trct
0   DJ  nj  Edison  10  1001.0  20345.0 Dew
1   DJ  nj  Edison  10  1002.0  NaN     Newca
2   DJ  nj  Edison  10  1003.0  20347.0 pen
3   ND  DE  Newark  20  2001.0  1985.0  flor
4   ND  DE  Newark  20  2002.0  NaN    rodge
5   ND  DE  Newark  20  NaN     NaN   None

答案 2 :(得分:0)

另一个选择:

df = pd.DataFrame({'col1': [1,2,3], 'col2':[3,4,5], 'col3':[5,6,7], 'tr1':[0,9,8], 'tr2':[0,9,8]})

df:

 col1   col2    col3    tr1 tr2
    0   1   3   5   0   0
    1   2   4   6   9   9
    2   3   5   7   8   8

子设置以创建2个df:

tr1_df = df[['col1', 'col2', 'col3', 'tr1']].rename(index=str, columns={"tr1":"tr"})
tr2_df = df[['col1', 'col2', 'col3', 'tr2']].rename(index=str, columns={"tr2":"tr"})
res = pd.concat([tr1_df, tr2_df])

结果:

col1 col2   col3    tr
0   1   3   5   0
1   2   4   6   9
2   3   5   7   8
0   1   3   5   0
1   2   4   6   9
2   3   5   7   8