在Pandas Dataframe

时间:2018-03-09 02:06:40

标签: pandas split co

在下面的示例df中,我试图找到一种基于';'分割列标题('1; 2','4','5; 6')的方法存在并复制这些拆分列中的行值。 (我的实际df来自导入的csv文件,所以通常我有大约50-80个列标题需要拆分)

下面是我的代码,输出

 import pandas as pd
 import numpy as np  
 #

 data = np.array([['Market','Product Code','1;2','4','5;6'],
            ['Total Customers',123,1,500,400],
            ['Total Customers',123,2,400,320],
            ['Major Customer 1',123,1,100,220],
            ['Major Customer 1',123,2,230,230],
            ['Major Customer 2',123,1,130,30],
            ['Major Customer 2',123,2,20,10],
            ['Total Customers',456,1,500,400],
            ['Total Customers',456,2,400,320],
            ['Major Customer 1',456,1,100,220],
            ['Major Customer 1',456,2,230,230],
            ['Major Customer 2',456,1,130,30],
            ['Major Customer 2',456,2,20,10]])

  df =pd.DataFrame(data)
  df.columns = df.iloc[0]
  df = df.reindex(df.index.drop(0))
  print (df)
0             Market Product Code 1;2    4  5;6
1    Total Customers          123   1  500  400
2    Total Customers          123   2  400  320
3   Major Customer 1          123   1  100  220
4   Major Customer 1          123   2  230  230
5   Major Customer 2          123   1  130   30
6   Major Customer 2          123   2   20   10
7    Total Customers          456   1  500  400
8    Total Customers          456   2  400  320
9   Major Customer 1          456   1  100  220
10  Major Customer 1          456   2  230  230
11  Major Customer 2          456   1  130   30
12  Major Customer 2          456   2   20   10

以下是我想要的输出

 0             Market Product Code   1   2      4      5    6
 1    Total Customers          123   1   1     500    400  400
 2    Total Customers          123   2   2     400    320  320
 3   Major Customer 1          123   1   1     100    220  220
 4   Major Customer 1          123   2   2     230    230  230
 5   Major Customer 2          123   1   1     130    30   30
 6   Major Customer 2          123   2   2     20     10   10
 7    Total Customers          456   1   1     500    400  400
 8    Total Customers          456   2   2     400    320  320
 9   Major Customer 1          456   1   1     100    220  220
10  Major Customer 1           456   2   2     230    230  230
11  Major Customer 2           456   1   1     130    30   30
12  Major Customer 2           456   2   2     20     10   10

理想情况下,我想在'read_csv'级别执行此类任务。有什么想法吗?

2 个答案:

答案 0 :(得分:1)

使用reindex

尝试repeat
s=df.columns.str.split(';')
df=df.reindex(columns=df.columns.repeat(s.str.len()))
df.columns=sum(s.tolist(),[])
df
Out[247]: 
              Market Product Code  1  2    4    5    6
1    Total Customers          123  1  1  500  400  400
2    Total Customers          123  2  2  400  320  320
3   Major Customer 1          123  1  1  100  220  220
4   Major Customer 1          123  2  2  230  230  230
5   Major Customer 2          123  1  1  130   30   30
6   Major Customer 2          123  2  2   20   10   10
7    Total Customers          456  1  1  500  400  400
8    Total Customers          456  2  2  400  320  320
9   Major Customer 1          456  1  1  100  220  220
10  Major Customer 1          456  2  2  230  230  230
11  Major Customer 2          456  1  1  130   30   30
12  Major Customer 2          456  2  2   20   10   10

答案 1 :(得分:1)

您可以使用';'分割列。然后重建一个df:

pd.DataFrame({c:df[t] for t in df.columns for c in t.split(';')})
Out[157]: 
    1  2    4    5    6            Market Product Code
1   1  1  500  400  400   Total Customers          123
2   2  2  400  320  320   Total Customers          123
3   1  1  100  220  220  Major Customer 1          123
4   2  2  230  230  230  Major Customer 1          123
5   1  1  130   30   30  Major Customer 2          123
6   2  2   20   10   10  Major Customer 2          123
7   1  1  500  400  400   Total Customers          456
8   2  2  400  320  320   Total Customers          456
9   1  1  100  220  220  Major Customer 1          456
10  2  2  230  230  230  Major Customer 1          456
11  1  1  130   30   30  Major Customer 2          456
12  2  2   20   10   10  Major Customer 2          456

或者如果您想保留列顺序:

pd.concat([df[t].to_frame(c) for t in df.columns for c in t.split(';')],1)
Out[167]: 
              Market Product Code  1  2    4    5    6
1    Total Customers          123  1  1  500  400  400
2    Total Customers          123  2  2  400  320  320
3   Major Customer 1          123  1  1  100  220  220
4   Major Customer 1          123  2  2  230  230  230
5   Major Customer 2          123  1  1  130   30   30
6   Major Customer 2          123  2  2   20   10   10
7    Total Customers          456  1  1  500  400  400
8    Total Customers          456  2  2  400  320  320
9   Major Customer 1          456  1  1  100  220  220
10  Major Customer 1          456  2  2  230  230  230
11  Major Customer 2          456  1  1  130   30   30
12  Major Customer 2          456  2  2   20   10   10