pandas groupby没有按预期工作

时间:2018-01-22 02:36:43

标签: python pandas group-by fillna

我有一个数据框:

    >>> d6
Out[57]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20    nan     nan       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20    nan     nan       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20    nan     nan       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20    nan     nan       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30    nan     nan      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30    nan     nan      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40    nan     nan      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50    nan     nan      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60    nan     nan      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70    nan     nan     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75    nan     nan      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01    nan     nan      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75    nan     nan      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01    nan     nan      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75    nan     nan      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01    nan     nan      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75    nan     nan      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01    nan     nan      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75    nan     nan      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01    nan     nan      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75    nan     nan      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01    nan     nan      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75    nan     nan      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01    nan     nan      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017

虽然我上面只显示了2个日期,但它有很多日期。每个日期都有几个"代码"的条目。给定日期的每个代码都有2个条目 - 一个用于sympbol C,另一个用于P.如果我有C或P的M1 / M2条目,我想填写" nan"那个代码/天的那个。如果对于给定的代码+天,C和P都是nan,我将其留下。

我目前正在执行以下操作:

for code in d1.code:
        x_df = d1[d1.code == code]
        x_df = x_df.groupby(['Date'], as_index=False).ffill().bfill()
        d1[d1.code == code] = x_df

这可行,但需要很长时间。以下是df的输出:

Out[62]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20   4.29   -9.46       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20   4.12   -9.55       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20   3.96   -9.64       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20   3.81   -9.71       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30   3.17  -10.11      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30   3.01  -10.22      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40   2.65  -10.50      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50   2.31  -10.80      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60   2.08  -11.05      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70   0.67  -11.16     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75    nan     nan      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01    nan     nan      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75    nan     nan      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01    nan     nan      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75    nan     nan      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01    nan     nan      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75    nan     nan      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01    nan     nan      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75    nan     nan      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01    nan     nan      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75    nan     nan      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01    nan     nan      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75    nan     nan      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01    nan     nan      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017
57828 2017-11-24       C  186.80 184.35 -154.72      -181.23  3400 - 12/22/2017
57847 2017-11-24       P    8.10 185.20 -187.99      -181.23  3400 - 12/22/2017
57829 2017-11-24       C  163.60 213.17 -174.17      -156.23  3425 - 12/22/2017
57848 2017-11-24       P    9.80 213.01 -205.82      -156.23  3425 - 12/22/2017

为了让它更快,我尝试了以下方法:

new_d1= d1.groupby(['code','Date'], as_index=False).ffill().bfill()

这不能按预期工作(如上面的代码一样)。看起来好像我们只是按日期分组而不是"代码"。这是输出:

>>> new_d1
Out[59]: 
            Date      sym   Last    M1      M2         dist           code
52735 2017-11-23       C    0.10   4.72   -9.27       677.93  4250 - 12/15/2017
52736 2017-11-23       P  684.20   1.43 -106.09       677.93  4250 - 12/15/2017
53144 2017-11-23       C    0.10   4.49   -9.37       727.93  4300 - 12/15/2017
53145 2017-11-23       P  734.20   0.69 -105.02       727.93  4300 - 12/15/2017
52738 2017-11-23       P  784.20   4.29   -9.46       777.93  4350 - 12/15/2017
52737 2017-11-23       C    0.10   4.29   -9.46       777.93  4350 - 12/15/2017
53081 2017-11-23       P  834.20   4.12   -9.55       827.93  4400 - 12/15/2017
53019 2017-11-23       C    0.10   4.12   -9.55       827.93  4400 - 12/15/2017
52747 2017-11-23       C    0.10   3.96   -9.64       877.93  4450 - 12/15/2017
52748 2017-11-23       P  884.20   3.96   -9.64       877.93  4450 - 12/15/2017
52605 2017-11-23       C    0.10   3.81   -9.71       927.93  4500 - 12/15/2017
52606 2017-11-23       P  934.20   3.81   -9.71       927.93  4500 - 12/15/2017
52753 2017-11-23       C    0.10   3.68   -9.79       977.93  4550 - 12/15/2017
52754 2017-11-23       P  984.30   2.04 -109.96       977.93  4550 - 12/15/2017
53020 2017-11-23       C    0.10   3.56   -9.86      1027.93  4600 - 12/15/2017
53082 2017-11-23       P 1034.30   1.55 -108.99      1027.93  4600 - 12/15/2017
54698 2017-11-23       P 1134.30   0.53 -106.79      1127.93  4700 - 12/15/2017
54687 2017-11-23       C    0.10   3.35   -9.99      1127.93  4700 - 12/15/2017
52337 2017-11-23       C    0.10   3.17  -10.11      1227.93  4800 - 12/15/2017
52338 2017-11-23       P 1234.30   3.17  -10.11      1227.93  4800 - 12/15/2017
54699 2017-11-23       P 1334.30   3.01  -10.22      1327.93  4900 - 12/15/2017
54688 2017-11-23       C    0.10   3.01  -10.22      1327.93  4900 - 12/15/2017
52191 2017-11-23       P    0.10   0.55  -11.15     -3072.07   500 - 12/15/2017
52190 2017-11-23       C 3066.80   0.29   82.60     -3072.07   500 - 12/15/2017
52339 2017-11-23       C    0.10   2.87  -10.32      1427.93  5000 - 12/15/2017
52340 2017-11-23       P 1434.40   1.26 -110.86      1427.93  5000 - 12/15/2017
54689 2017-11-23       C    0.10   2.75  -10.41      1527.93  5100 - 12/15/2017
54700 2017-11-23       P 1534.40   0.45 -108.55      1527.93  5100 - 12/15/2017
52341 2017-11-23       C    0.10   2.65  -10.50      1627.93  5200 - 12/15/2017
52342 2017-11-23       P 1634.40   2.65  -10.50      1627.93  5200 - 12/15/2017
52439 2017-11-23       C    0.10   2.55  -10.58      1727.93  5300 - 12/15/2017
52440 2017-11-23       P 1734.50   1.72 -114.79      1727.93  5300 - 12/15/2017
52343 2017-11-23       C    0.10   2.46  -10.66      1827.93  5400 - 12/15/2017
52344 2017-11-23       P 1834.50   1.08 -112.69      1827.93  5400 - 12/15/2017
54701 2017-11-23       P 1934.50   0.40 -110.30      1927.93  5500 - 12/15/2017
54690 2017-11-23       C    0.10   2.38  -10.73      1927.93  5500 - 12/15/2017
52346 2017-11-23       P 2034.50   2.31  -10.80      2027.93  5600 - 12/15/2017
52345 2017-11-23       C    0.10   2.31  -10.80      2027.93  5600 - 12/15/2017
54691 2017-11-23       C    0.10   2.24  -10.87      2127.93  5700 - 12/15/2017
54702 2017-11-23       P 2134.60   1.52 -116.68      2127.93  5700 - 12/15/2017
52348 2017-11-23       P 2234.60   0.97 -114.51      2227.93  5800 - 12/15/2017
52347 2017-11-23       C    0.10   2.18  -10.93      2227.93  5800 - 12/15/2017
54703 2017-11-23       P 2334.60   0.37 -112.06      2327.93  5900 - 12/15/2017
54692 2017-11-23       C    0.10   2.13  -10.99      2327.93  5900 - 12/15/2017
52192 2017-11-23       C 2966.80   0.46   80.38     -2972.07   600 - 12/15/2017
52193 2017-11-23       P    0.10   0.61  -11.16     -2972.07   600 - 12/15/2017
52349 2017-11-23       C    0.10   2.08  -11.05      2427.93  6000 - 12/15/2017
52350 2017-11-23       P 2434.60   2.08  -11.05      2427.93  6000 - 12/15/2017
52194 2017-11-23       C 2866.70   0.67  -11.16     -2872.07   700 - 12/15/2017
52195 2017-11-23       P    0.10   0.67  -11.16     -2872.07   700 - 12/15/2017
54449 2017-11-23       C    0.10   1.71  -11.52      3427.93  7000 - 12/15/2017
54479 2017-11-23       P 3434.90   0.77 -119.84      3427.93  7000 - 12/15/2017
57740 2017-11-24       C  787.75   7.77  -25.05      -781.23  2800 - 11/24/2017
57742 2017-11-24       P    0.01   7.77  -25.05      -781.23  2800 - 11/24/2017
57741 2017-11-24       C  737.75   7.77  -25.05      -731.23  2850 - 11/24/2017
57743 2017-11-24       P    0.01   7.77  -25.05      -731.23  2850 - 11/24/2017
57730 2017-11-24       C  687.75   7.77  -25.05      -681.23  2900 - 11/24/2017
57735 2017-11-24       P    0.01   7.77  -25.05      -681.23  2900 - 11/24/2017
57731 2017-11-24       C  637.75   7.77  -25.05      -631.23  2950 - 11/24/2017
57736 2017-11-24       P    0.01   7.77  -25.05      -631.23  2950 - 11/24/2017
57732 2017-11-24       C  587.75   7.77  -25.05      -581.23  3000 - 11/24/2017
57737 2017-11-24       P    0.01   7.77  -25.05      -581.23  3000 - 11/24/2017
57733 2017-11-24       C  537.75   7.77  -25.05      -531.23  3050 - 11/24/2017
57738 2017-11-24       P    0.01   7.77  -25.05      -531.23  3050 - 11/24/2017
57727 2017-11-24       P    0.20   7.77  -25.05      -431.23  3150 - 12/08/2017
57728 2017-11-24       P    0.30  11.49  -34.45      -381.23  3200 - 12/08/2017
57734 2017-11-24       C  362.75  14.84  -43.17      -356.23  3225 - 11/24/2017
57739 2017-11-24       P    0.01  14.84  -43.17      -356.23  3225 - 11/24/2017
57729 2017-11-24       P    0.40  14.84  -43.17      -356.23  3225 - 12/08/2017
57826 2017-11-24       C  234.50 140.14 -124.53      -231.23  3350 - 12/22/2017
57845 2017-11-24       P    5.70 140.19 -156.23      -231.23  3350 - 12/22/2017
57827 2017-11-24       C  210.50 160.38 -138.61      -206.23  3375 - 12/22/2017
57846 2017-11-24       P    6.70 160.34 -170.27      -206.23  3375 - 12/22/2017
57828 2017-11-24       C  186.80 184.35 -154.72      -181.23  3400 - 12/22/2017
57847 2017-11-24       P    8.10 185.20 -187.99      -181.23  3400 - 12/22/2017
57829 2017-11-24       C  163.60 213.17 -174.17      -156.23  3425 - 12/22/2017
57848 2017-11-24       P    9.80 213.01 -205.82      -156.23  3425 - 12/22/2017

有没有办法加速上面的代码或任何有关第二个代码无法工作的见解。

1 个答案:

答案 0 :(得分:3)

问题发生在第二个bfill(它将为整个数据帧而不是每个子组重新填充nan),下面将为你工作

df.groupby(['code','Date']).apply(lambda x : x.ffill().bfill())

例如,我们通常认为这将返回每个组的总和,但它将返回一个数字。

df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
df.groupby('A').sum().sum()
Out[958]: 
B    14
dtype: int64