获取与列中的最大值关联的行数据(Python / Pandas)

时间:2013-09-13 12:59:49

标签: python-2.7 pandas

鉴于以下数据:

               Sum  amount_net  amount_gross    symbol  Date_Time
ts                  
7/29/2013 2:17  -68 755,101 -755,101        A   7/29/2013 2:17
7/29/2013 2:17  -21 251,945 -251,945        B   7/29/2013 2:17
7/29/2013 2:16  -1  2,200   -2,200          C   7/29/2013 2:16
7/29/2013 2:17  -5  11,000  -11,000         C   7/29/2013 2:17
7/29/2013 2:08  -1  5,384   -5,384          D   7/29/2013 2:08
7/29/2013 2:09  -3  16,151  -16,151         D   7/29/2013 2:09
7/29/2013 2:13  1   5,384   5,384           D   7/29/2013 2:13
7/29/2013 2:02  20  70,000  70,000          F   7/29/2013 2:02
7/29/2013 2:03  22  77,000  77,000          F   7/29/2013 2:03
7/29/2013 2:04  18  63,000  63,000          F   7/29/2013 2:04
7/29/2013 2:05  15  52,500  52,500          F   7/29/2013 2:05
7/29/2013 2:08  15  52,500  52,500          F   7/29/2013 2:08
7/29/2013 2:09  8   28,000  28,000          F   7/29/2013 2:09
7/29/2013 2:10  22  77,000  77,000          F   7/29/2013 2:10
7/29/2013 2:11  22  77,000  77,000          F   7/29/2013 2:11
7/29/2013 2:12  12  42,000  42,000          F   7/29/2013 2:12
7/29/2013 2:13  5   17,500  17,500          F   7/29/2013 2:13
7/29/2013 2:14  30  105,000 105,000         F   7/29/2013 2:14
7/29/2013 2:15  35  122,500 122,500         F   7/29/2013 2:15
7/29/2013 2:16  35  122,500 122,500         F   7/29/2013 2:16

我希望在该符号的最大时间返回每个符号,sum,amount_net和amount_gross。即我想得到:

symbol  Time           Sum  amount_net  amount_gross
A   7/29/2013 2:17  -68 755,101        -755,101
B   7/29/2013 2:17  -21 251,945        -251,945
C   7/29/2013 2:17  -5  11,000          -11,000
D   7/29/2013 2:13  1   5,384             5,384
F   7/29/2013 2:16  35  122,500         122,500

2 个答案:

答案 0 :(得分:2)

按时间顺序排序,逐个符号排序,然后从每个组中取最后一个(也就是说“最长时间”)元素。

In [28]: df.sort('Date_Time').groupby('symbol').last()
Out[28]: 
                 Date_Time  Sum  amount_net  amount_gross
symbol                                                   
A      2013-07-29 02:17:00  -68      755101       -755101
B      2013-07-29 02:17:00  -21      251945       -251945
C      2013-07-29 02:17:00   -5       11000        -11000
D      2013-07-29 02:13:00    1        5384          5384
F      2013-07-29 02:16:00   35      122500        122500

请参阅@Andy关于将数字解析为整数的说法。

答案 1 :(得分:0)

简单地分组符号和总和:

In [11]: df1.groupby('symbol').sum()
Out[11]:
        Sum  amount_net  amount_gross
symbol
A       -68      755101       -755101
B       -21      251945       -251945
C        -6       13200        -13200
D        -3       26919        -16151
F       259      906500        906500

注意:atm它看起来像amount_netamount_gross没有被正确解析为整数,而是它们是字符串,但你可以使用转换:

df1[['amount_net', 'amount_gross']] = df1[['amount_net', 'amount_gross']].applymap(lambda x: int(x.replace(',', '')))