我发现回应接近这个,但没有什么可以解决这个问题。我有一个看起来像这样的数据表:
ID DATE
74180 11/07/2000
74180 11/04/2008
81337 11/04/2008
81337 11/02/2010
82557 11/07/2000
82557 11/05/2002
82557 11/02/2004
82557 11/04/2008
82557 11/06/2012
82901 11/07/2000
82901 11/05/2002
82901 11/02/2004
82901 11/04/2008
82901 11/06/2012
82901 11/04/2014
83103 11/04/2008
83103 11/02/2010
83103 11/06/2012
83103 11/04/2014
我想转换它,以便每个ID占用一行,各个日期表示为二进制列,即:
ID 11/07/2000 11/05/2002 11/02/2004 ...
74180 1 0 0
81337 0 0 0
非常感谢任何指导。
答案 0 :(得分:0)
考虑:
df.set_index('ID', inplace=True)
pd.get_dummies(df.loc[:, 'DATE']).groupby(level='ID').sum()
2000-11-07 2002-11-05 2004-11-02 2008-11-04 2010-11-02 2012-11-06 \
ID
74180 1.0 0.0 0.0 1.0 0.0 0.0
81337 0.0 0.0 0.0 1.0 1.0 0.0
82557 1.0 1.0 1.0 1.0 0.0 1.0
82901 1.0 1.0 1.0 1.0 0.0 1.0
83103 0.0 0.0 0.0 1.0 1.0 1.0
2014-11-04
ID
74180 0.0
81337 0.0
82557 0.0
82901 1.0
83103 1.0
答案 1 :(得分:0)
首先,重新创建DataFrame:
ID = [74180,74180,81337,81337,82557,82557,82557,82557,82557,82901,82901,82901,82901,82901,82901,83103,83103,83103,83103]
DATE = ['2000-11-07','2008-11-04','2008-11-04','2010-11-02','2000-11-07','2002-11-05','2004-11-02','2008-11-04','2012-11-06','2000-11-7','2002-11-05','2004-11-02','2008-11-04','2012-11-06','2014-11-04','2008-11-04','2010-11-02','2012-11-06','2014-11-04']
df = pd.DataFrame({'ID':ID, 'DATE':DATE})
实际处理:
df2 = pd.get_dummies(df.set_index('ID')['DATE'])
df2.reset_index().groupby('ID').sum()
输出:
2000-11-07 2000-11-7 2002-11-05 2004-11-02 2008-11-04 ...
ID
74180 1.0 0.0 0.0 0.0 1.0 ...
81337 0.0 0.0 0.0 0.0 1.0 ...
82557 1.0 0.0 1.0 1.0 1.0 ...
82901 0.0 1.0 1.0 1.0 1.0 ...
83103 0.0 0.0 0.0 0.0 1.0 ...