将多列转换/取消转换为Python Dataframe中的行

时间:2019-11-21 08:43:44

标签: python pandas

我有一个数据集,需要将其解开成多行。

例如:

id  cor_id1 mail11 mail12 mail13 cor_id2 mail21 mail22 mail23 cor_id3 mail31 mail32 mail33
1    1      a@123  b@234  c@123    2     a@def  b@fgh   c@asd   3      s@wer  b@ert  e@rty
2    4      e@234  e@234  e@qwe    9     e@dfe  f@jfg   r@ert   10     e@wer  g@wer  e@ert

我需要将其取消显示为

id cor_id mail
1   1     a@123
1   1     b@234
1   1     c@123
1   2     a@def
1   2     b@fgh
1   2     c@asd
1   3     s@wer
1   3     b@ert
1   3     e@rty
2   4     e@234
2   4     e@234
2   4     e@qwe
2   9     e@dfe
2   9     r@ert
2   10    e@wer
2   10    g@wer
2   10    e@ert

我尝试了df.melt,但这仅提供了1列。

如果数据有多列要转换为行怎么办?

id  cor_id1 ad1 mail11 mail12 mail13 cor_id2 ad2 mail21 mail22 mail23 cor_id3 ad3 mail31 mail32 mail33
1    1     23    a@123  b@234  c@123        2   24  a@def  b@fgh   c@asd      3   25   s@wer  b@ert  e@rty
2    4     33    e@234  e@234  e@qwe        9   34 e@dfe  f@jfg   r@ert      10  35    e@wer  g@wer  e@ert

我想要

id cor_id  ad  mail
1   1      23  a@123
1   1      23 b@234
1   1      23 c@123
1   2      24 a@def
1   2      24  b@fgh
1   2      24 c@asd
1   3      25 s@wer
1   3      25 b@ert
1   3      25 e@rty
2   4      33 e@234
2   4      33 e@234
2   4      33 e@qwe
2   9      34 e@dfe
2   9      34 f@jfg
2   9      34 r@ert
2   10     35 e@wer
2   10     35 g@wer
2   10     35 e@ert

2 个答案:

答案 0 :(得分:2)

使用wide_to_long,但首先必须更改cor_id列的列名称,并添加最后一位数字:

df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

替代方法是添加0,并用dropna删除丢失的行:

df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()

print (df)
    id  cor_id   mail
0    1     1.0  a@123
1    1     1.0  b@234
2    1     1.0  c@123
3    1     1.0  a@def
4    1     2.0  b@fgh
5    1     2.0  s@wer
6    1     2.0  b@ert
7    1     3.0  e@rty
8    1     3.0  c@asd
9    2     4.0  e@234
10   2     4.0  e@234
11   2     4.0  e@qwe
12   2     4.0  e@dfe
13   2     9.0  f@jfg
14   2     9.0  e@wer
15   2     9.0  g@wer
16   2    10.0  e@ert
17   2    10.0  r@ert

编辑:如果有cor_id之类的多个列,则仅将其添加到元组以供startswith测试,然后使用listffill更改所有列的正向填充:

df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
    id  cor_id    ad   mail
0    1     1.0  23.0  a@123
1    1     1.0  23.0  b@234
2    1     1.0  23.0  c@123
3    1     2.0  24.0  a@def
4    1     2.0  24.0  b@fgh
5    1     2.0  24.0  c@asd
6    1     3.0  25.0  s@wer
7    1     3.0  25.0  b@ert
8    1     3.0  25.0  e@rty
9    2     4.0  33.0  e@234
10   2     4.0  33.0  e@234
11   2     4.0  33.0  e@qwe
12   2     9.0  34.0  e@dfe
13   2     9.0  34.0  f@jfg
14   2     9.0  34.0  r@ert
15   2    10.0  35.0  e@wer
16   2    10.0  35.0  g@wer
17   2    10.0  35.0  e@ert

答案 1 :(得分:0)

你也可以像这样使用 pd.melt():

melted = pd.melt(df, id_vars = ['id','cor_id'], value_vars = ['mail11','mail12',...], var_name = 'Attribute', value_name = 'Value')

melted.head()