我有一个包含2列的pandas DataFrame:'IMO'和'LOAD_DATE'。 许多IMO都有多个加载日期。
我想创建另一个DataFrame,其中所有日期都是每个IMO的索引和新列。每一列都填充了空白日的'0'和负载日的'1'。
输入文件:
| VESSEL_IMO | Date
1 | 9821 | 16-12-16
2 | 9821 | 20-12-16
3 | 9822 | 16-12-16
4 | 9822 | 17-12-16
5 | 9823 | 16-12-16
6 | 9823 | 18-12-16
7 | 9999 | 15-12-16
8 | 9999 | 18-12-16
9 | 9999 | 21-12-16
以下是我的代码示例,它返回给我:
IndexError:索引超出范围
df = pd.DataFrame({'Date' : calendrier})
for namm in xl['AS_VESSEL_IMO'].unique():
df[namm] = 0
al_datt = xl[xl['AS_VESSEL_IMO'] == namm]['AS_LOAD_DATE']
df.ix[df['Date'].isin(al_datt), df[namm]] = 1
欲望输出:
Date | 9821 | 9822 | 9823 |...| 9999
15-12-16 | 0 | 0 | 0 |...| 1
16-12-16 | 1 | 1 | 1 |...| 0
17-12-16 | 0 | 1 | 0 |...| 0
18-12-16 | 0 | 0 | 1 |...| 1
19-12-16 | 0 | 0 | 0 |...| 0
20-12-16 | 1 | 0 | 0 |...| 0
21-12-16 | 0 | 0 | 0 |...| 1
答案 0 :(得分:1)
样品:
df1 = pd.DataFrame({'Date' : pd.date_range('16-12-2016', periods=10)})
print (df1)
Date
0 2016-12-16
1 2016-12-17
2 2016-12-18
3 2016-12-19
4 2016-12-20
5 2016-12-21
6 2016-12-22
7 2016-12-23
8 2016-12-24
9 2016-12-25
我认为您需要unstack
,如果与groupby
汇总重复max
:
df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.set_index(['Date', 'VESSEL_IMO'])['a'].unstack(fill_value=0)
#if duplicates in rows and get ValueError: Index contains duplicate entries, cannot reshape
#df = df.groupby(['Date', 'VESSEL_IMO'])['a'].max().unstack(fill_value=0)
print (df)
VESSEL_IMO 9821 9822 9823 9999
Date
2016-12-15 0 0 0 1
2016-12-16 1 1 1 0
2016-12-17 0 1 0 0
2016-12-18 0 0 1 1
2016-12-20 1 0 0 0
2016-12-21 0 0 0 1
最后reindex
:
df = df.reindex(df1.Date, fill_value=0)
print (df)
VESSEL_IMO 9821 9822 9823 9999
Date
2016-12-16 1 1 1 0
2016-12-17 0 1 0 0
2016-12-18 0 0 1 1
2016-12-19 0 0 0 0
2016-12-20 1 0 0 0
2016-12-21 0 0 0 1
2016-12-22 0 0 0 0
2016-12-23 0 0 0 0
2016-12-24 0 0 0 0
2016-12-25 0 0 0 0
pivot
或pivot_table
的其他解决方案:
df['a'] = 1
df.Date = pd.to_datetime(df.Date)
df = df.pivot(index ='Date', columns='VESSEL_IMO', values='a').fillna(0)
#if duplicated index
#df = df.pivot_table(index='Date',columns='VESSEL_IMO',values='a',fill_value=0,aggfunc='max')
print (df)
VESSEL_IMO 9821 9822 9823 9999
Date
2016-12-15 0.0 0.0 0.0 1.0
2016-12-16 1.0 1.0 1.0 0.0
2016-12-17 0.0 1.0 0.0 0.0
2016-12-18 0.0 0.0 1.0 1.0
2016-12-20 1.0 0.0 0.0 0.0
2016-12-21 0.0 0.0 0.0 1.0
df = df.reindex(df1.Date, fill_value=0).astype(int)
VESSEL_IMO 9821 9822 9823 9999
Date
2016-12-16 1 1 1 0
2016-12-17 0 1 0 0
2016-12-18 0 0 1 1
2016-12-19 0 0 0 0
2016-12-20 1 0 0 0
2016-12-21 0 0 0 1
2016-12-22 0 0 0 0
2016-12-23 0 0 0 0
2016-12-24 0 0 0 0
2016-12-25 0 0 0 0