如何从多个数据帧中填充新列?

时间:2017-03-08 11:56:27

标签: python pandas

我有一个包含2列的pandas DataFrame:'IMO'和'LOAD_DATE'。 许多IMO都有多个加载日期。

我想创建另一个DataFrame,其中所有日期都是每个IMO的索引和新列。每一列都填充了空白日的'0'和负载日的'1'。

输入文件:

    | VESSEL_IMO |    Date 
  1 |    9821    |   16-12-16
  2 |    9821    |   20-12-16
  3 |    9822    |   16-12-16
  4 |    9822    |   17-12-16
  5 |    9823    |   16-12-16
  6 |    9823    |   18-12-16
  7 |    9999    |   15-12-16
  8 |    9999    |   18-12-16
  9 |    9999    |   21-12-16

以下是我的代码示例,它返回给我:

IndexError:索引超出范围

df = pd.DataFrame({'Date' : calendrier})

for namm in xl['AS_VESSEL_IMO'].unique():
    df[namm] = 0    
    al_datt = xl[xl['AS_VESSEL_IMO'] == namm]['AS_LOAD_DATE']
    df.ix[df['Date'].isin(al_datt), df[namm]] = 1

欲望输出:

    Date   | 9821 | 9822 | 9823 |...| 9999 
  15-12-16 |   0  |   0  |   0  |...|   1 
  16-12-16 |   1  |   1  |   1  |...|   0 
  17-12-16 |   0  |   1  |   0  |...|   0 
  18-12-16 |   0  |   0  |   1  |...|   1 
  19-12-16 |   0  |   0  |   0  |...|   0 
  20-12-16 |   1  |   0  |   0  |...|   0 
  21-12-16 |   0  |   0  |   0  |...|   1 

1 个答案:

答案 0 :(得分:1)

样品:

df1 = pd.DataFrame({'Date' : pd.date_range('16-12-2016', periods=10)})
print (df1)
        Date
0 2016-12-16
1 2016-12-17
2 2016-12-18
3 2016-12-19
4 2016-12-20
5 2016-12-21
6 2016-12-22
7 2016-12-23
8 2016-12-24
9 2016-12-25

我认为您需要unstack,如果与groupby汇总重复max

df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.set_index(['Date', 'VESSEL_IMO'])['a'].unstack(fill_value=0)

#if duplicates in rows and get ValueError: Index contains duplicate entries, cannot reshape
#df = df.groupby(['Date', 'VESSEL_IMO'])['a'].max().unstack(fill_value=0)
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-15     0     0     0     1
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1

最后reindex

df = df.reindex(df1.Date, fill_value=0)
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-19     0     0     0     0
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1
2016-12-22     0     0     0     0
2016-12-23     0     0     0     0
2016-12-24     0     0     0     0
2016-12-25     0     0     0     0

pivotpivot_table的其他解决方案:

df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.pivot(index ='Date', columns='VESSEL_IMO', values='a').fillna(0)
#if duplicated index
#df = df.pivot_table(index='Date',columns='VESSEL_IMO',values='a',fill_value=0,aggfunc='max')
print (df)
VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-15   0.0   0.0   0.0   1.0
2016-12-16   1.0   1.0   1.0   0.0
2016-12-17   0.0   1.0   0.0   0.0
2016-12-18   0.0   0.0   1.0   1.0
2016-12-20   1.0   0.0   0.0   0.0
2016-12-21   0.0   0.0   0.0   1.0

df = df.reindex(df1.Date, fill_value=0).astype(int)

VESSEL_IMO  9821  9822  9823  9999
Date                              
2016-12-16     1     1     1     0
2016-12-17     0     1     0     0
2016-12-18     0     0     1     1
2016-12-19     0     0     0     0
2016-12-20     1     0     0     0
2016-12-21     0     0     0     1
2016-12-22     0     0     0     0
2016-12-23     0     0     0     0
2016-12-24     0     0     0     0
2016-12-25     0     0     0     0