融化在熊猫的多个层面

时间:2018-04-23 14:13:44

标签: python pandas

我有一个带有零件号,年份和每月消耗量的df,如下所示01,02和03中的值是每年1月至3月的数量。

d = {'PN': [10506,10506,10507,10507],
 'Year': [2017, 2018, 2017, 2018],
 '01': [1,2,3,4],
 '02': [5,6,7,8],
 '03': [9,10,11,12]}
indata = pd.DataFrame(data = d)

我想通过将年份和月份组合为YYYYMM格式将其重组为长格式,并且每行的部件号,年份和数量如下所示。

dd = {'PN': [10506,10506,10506,10506,10506,10506,10507,10507,10507,10507,10507,10507],
  'YearMonth': [201701,201702,201703,201801,201802,201803,201701,201702,201703,201801,201802,201803],
  'Qty': [1,5,9,2,6,10,3,7,11,4,8,12]}
outdata = pd.DataFrame(data = dd)

由于我使用pd.melt失败了,我尝试使用三重for循环,如下所示。

parts = pd.Series(indata['PN']).unique()
years = pd.Series(indata['Year']).unique()
months = ['01', '02', '03']

df = pd.DataFrame(columns = ['PN', 'YearMonth', 'Qty'])

for p in parts:
    for y in years:
        for m in months:
            yearmonth = str(y*100+int(m))
            qty = indata.loc[(indata['PN'] == p) & (indata['Year'] == y), m].iloc[0]
            row = [p, yearmonth, qty]
            df = df.append(row)
outdata = df

这看起来非常低效,我的追加函数不会在每个循环中添加一行,而是在新列中添加三行。

有什么建议吗?

2 个答案:

答案 0 :(得分:3)

先使用melt进行重塑,然后按assign创建新列YearMonth,删除不必要的列并上传sort_values

df = (indata.melt(id_vars=['PN','Year'], var_name='v', value_name='Qty')
            .assign(YearMonth=lambda x: x['Year'].astype(str) + x['v'])
            .drop(['v', 'Year'], axis=1)
            .sort_values(['PN','YearMonth']))

print (df)
       PN  Qty YearMonth
0   10506    1    201701
4   10506    5    201702
8   10506    9    201703
1   10506    2    201801
5   10506    6    201802
9   10506   10    201803
2   10507    3    201701
6   10507    7    201702
10  10507   11    201703
3   10507    4    201801
7   10507    8    201802
11  10507   12    201803

答案 1 :(得分:1)

您可以使用s=indata.melt(['Year','PN']) s['Year']=s.Year.astype(str)+s.variable.astype(str) s Out[262]: Year PN variable value 0 201701 10506 01 1 1 201801 10506 01 2 2 201701 10507 01 3 3 201801 10507 01 4 4 201702 10506 02 5 5 201802 10506 02 6 6 201702 10507 02 7 7 201802 10507 02 8 8 201703 10506 03 9 9 201803 10506 03 10 10 201703 10507 03 11 11 201803 10507 03 12

stack

或只是s=indata.set_index(['Year','PN']).stack().reset_index() s['YearMonth']=s.Year.astype(str)+s['level_2'].astype(str) s.rename(columns={0:'Qty'}).drop(['level_2','Year'],1) Out[274]: PN Qty YearMonth 0 10506 1 20170101 1 10506 5 20170202 2 10506 9 20170303 3 10506 2 20180101 4 10506 6 20180202 5 10506 10 20180303 6 10507 3 20170101 7 10507 7 20170202 8 10507 11 20170303 9 10507 4 20180101 10 10507 8 20180202 11 10507 12 20180303

yourChart.options = {
          legend: {
            display: false,
           }
      };