从以下数据框架中,我需要创建一个包含2个独立DBKEY
列(STATION
列VAL
的新VAL1
列VAL2
STATION
{1}})具有相同的STATION
。
DBKEY STATION DAILY_DATE VAL 0 T9947 G377C_C 2011-10-01 00:00:00 17.123 1 T9947 G377C_C 2011-10-02 00:00:00 NaN 2 T9947 G377C_C 2011-10-03 00:00:00 NaN 3 T9947 G377C_C 2011-10-04 00:00:00 NaN 4 T9947 G377C_C 2011-10-05 00:00:00 NaN 5 T9947 G377C_C 2011-10-06 00:00:00 NaN 6 T9947 G377C_C 2011-10-07 00:00:00 NaN 7 T9947 G377C_C 2011-10-08 00:00:00 NaN 8 T9947 G377C_C 2011-10-09 00:00:00 92.734 9 T9947 G377C_C 2011-10-10 00:00:00 48.975 10 T9947 G377C_C 2011-10-11 00:00:00 17.463 11 T9947 G377C_C 2011-10-12 00:00:00 NaN 12 T9947 G377C_C 2011-10-13 00:00:00 NaN 13 T9947 G377C_C 2011-10-14 00:00:00 12.870 14 T9947 G377C_C 2011-10-15 00:00:00 NaN 15 T9947 G377C_C 2011-10-16 00:00:00 48.138 16 T9947 G377C_C 2011-10-17 00:00:00 0.413 17 T9947 G377C_C 2011-10-18 00:00:00 39.058 18 T9947 G377C_C 2011-10-19 00:00:00 235.617 19 T9947 G377C_C 2011-10-20 00:00:00 182.989 20 T9947 G377C_C 2011-10-21 00:00:00 132.193 21 T9947 G377C_C 2011-10-22 00:00:00 19.557 22 T9947 G377C_C 2011-10-23 00:00:00 NaN 23 T9947 G377C_C 2011-10-24 00:00:00 80.552 24 T9947 G377C_C 2011-10-25 00:00:00 NaN 25 T9947 G377C_C 2011-10-26 00:00:00 NaN 26 T9947 G377C_C 2011-10-27 00:00:00 39.258 27 T9947 G377C_C 2011-10-28 00:00:00 NaN 28 T9947 G377C_C 2011-10-29 00:00:00 253.969 29 T9947 G377C_C 2011-10-30 00:00:00 319.685 30 T9947 G377C_C 2011-10-31 00:00:00 303.855 31 W3972 G377C_C 2011-10-01 00:00:00 17.120 32 W3972 G377C_C 2011-10-02 00:00:00 NaN 33 W3972 G377C_C 2011-10-03 00:00:00 NaN 34 W3972 G377C_C 2011-10-04 00:00:00 NaN 35 W3972 G377C_C 2011-10-05 00:00:00 NaN 36 W3972 G377C_C 2011-10-06 00:00:00 NaN 37 W3972 G377C_C 2011-10-07 00:00:00 NaN 38 W3972 G377C_C 2011-10-08 00:00:00 NaN 39 W3972 G377C_C 2011-10-09 00:00:00 92.730 40 W3972 G377C_C 2011-10-10 00:00:00 48.980 41 W3972 G377C_C 2011-10-11 00:00:00 17.460 42 W3972 G377C_C 2011-10-12 00:00:00 NaN 43 W3972 G377C_C 2011-10-13 00:00:00 NaN 44 W3972 G377C_C 2011-10-14 00:00:00 12.870 45 W3972 G377C_C 2011-10-15 00:00:00 NaN 46 W3972 G377C_C 2011-10-16 00:00:00 48.140 47 W3972 G377C_C 2011-10-17 00:00:00 0.410 48 W3972 G377C_C 2011-10-18 00:00:00 39.060 49 W3972 G377C_C 2011-10-19 00:00:00 235.620 50 W3972 G377C_C 2011-10-20 00:00:00 182.990 51 W3972 G377C_C 2011-10-21 00:00:00 132.190 52 W3972 G377C_C 2011-10-22 00:00:00 19.560 53 W3972 G377C_C 2011-10-23 00:00:00 NaN 54 W3972 G377C_C 2011-10-24 00:00:00 80.550 55 W3972 G377C_C 2011-10-25 00:00:00 NaN 56 W3972 G377C_C 2011-10-26 00:00:00 NaN 57 W3972 G377C_C 2011-10-27 00:00:00 39.260 58 W3972 G377C_C 2011-10-28 00:00:00 NaN 59 W3972 G377C_C 2011-10-29 00:00:00 253.970 60 W3972 G377C_C 2011-10-30 00:00:00 319.690 61 W3972 G377C_C 2011-10-31 00:00:00 303.860
所以,我需要结果只有31行,VAL1
和DBKEY
(第一组VAL2
s)和DBKEY
(第二组{{} 1}} S)。
STATION DAILY_DATE VAL1 VAL2
G377C_C 10/1/2011 17.123 17.12
G377C_C 10/2/2011 NaN NaN
G377C_C 10/3/2011 NaN NaN
G377C_C 10/4/2011 NaN NaN
G377C_C 10/5/2011 NaN NaN
G377C_C 10/6/2011 NaN NaN
G377C_C 10/7/2011 NaN NaN
G377C_C 10/8/2011 NaN NaN
G377C_C 10/9/2011 92.734 92.73
G377C_C 10/10/2011 48.975 48.98
G377C_C 10/11/2011 17.463 17.46
G377C_C 10/12/2011 NaN NaN
G377C_C 10/13/2011 NaN NaN
G377C_C 10/14/2011 12.87 12.87
G377C_C 10/15/2011 NaN NaN
G377C_C 10/16/2011 48.138 48.14
G377C_C 10/17/2011 0.413 0.41
G377C_C 10/18/2011 39.058 39.06
G377C_C 10/19/2011 235.617 235.62
G377C_C 10/20/2011 182.989 182.99
G377C_C 10/21/2011 132.193 132.19
G377C_C 10/22/2011 19.557 19.56
G377C_C 10/23/2011 NaN NaN
G377C_C 10/24/2011 80.552 80.55
G377C_C 10/25/2011 NaN NaN
G377C_C 10/26/2011 NaN NaN
G377C_C 10/27/2011 39.258 39.26
G377C_C 10/28/2011 NaN NaN
G377C_C 10/29/2011 253.969 253.97
G377C_C 10/30/2011 319.685 319.69
G377C_C 10/31/2011 303.855 303.86
答案 0 :(得分:2)
如果我理解正确的话,这似乎很简单。 unstack()
应该照顾它:
In [2]: df = DataFrame({"DBKEY":['T9947', 'T9947', 'T9947', 'W3972','W3972','W3972'],"STATION":['G377C_C','G377C_C','G377C_C','G377C_C','G377C_C','G377C_C'],"DAILY_DATE":['2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00','2011-10-01 00:00:00','2011-10-02 00:00:00','2011-10-03 00:00:00'],"VAL":[ 17.123, 'NaN', 'NaN', '17.120', 'NaN', 'NaN']})
In [3]: df
Out[3]:
DAILY_DATE DBKEY STATION VAL
0 2011-10-01 00:00:00 T9947 G377C_C 17.123
1 2011-10-02 00:00:00 T9947 G377C_C NaN
2 2011-10-03 00:00:00 T9947 G377C_C NaN
3 2011-10-01 00:00:00 W3972 G377C_C 17.120
4 2011-10-02 00:00:00 W3972 G377C_C NaN
5 2011-10-03 00:00:00 W3972 G377C_C NaN
In [4]: df2 = df.set_index(["STATION", "DBKEY", "DAILY_DATE"])
In [5]: df2
Out[5]:
VAL
STATION DBKEY DAILY_DATE
G377C_C T9947 2011-10-01 00:00:00 17.123
2011-10-02 00:00:00 NaN
2011-10-03 00:00:00 NaN
W3972 2011-10-01 00:00:00 17.120
2011-10-02 00:00:00 NaN
2011-10-03 00:00:00 NaN
In [6]: df3 = df2.unstack(level=1)
In [7]: df3
Out[7]:
VAL
DBKEY T9947 W3972
STATION DAILY_DATE
G377C_C 2011-10-01 00:00:00 17.123 17.120
2011-10-02 00:00:00 NaN NaN
2011-10-03 00:00:00 NaN NaN