我有一个数据框:
>>> d6
Out[57]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 nan nan 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 nan nan 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 nan nan 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 nan nan 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 nan nan 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 nan nan 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 nan nan 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 nan nan 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 nan nan 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 nan nan -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
虽然我上面只显示了2个日期,但它有很多日期。每个日期都有几个"代码"的条目。给定日期的每个代码都有2个条目 - 一个用于sympbol C,另一个用于P.如果我有C或P的M1 / M2条目,我想填写" nan"那个代码/天的那个。如果对于给定的代码+天,C和P都是nan,我将其留下。
我目前正在执行以下操作:
for code in d1.code:
x_df = d1[d1.code == code]
x_df = x_df.groupby(['Date'], as_index=False).ffill().bfill()
d1[d1.code == code] = x_df
这可行,但需要很长时间。以下是df的输出:
Out[62]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 nan nan -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 nan nan -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 nan nan -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 nan nan -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 nan nan -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 nan nan -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 nan nan -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 nan nan -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 nan nan -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 nan nan -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 nan nan -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 nan nan -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 nan nan -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 nan nan -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
为了让它更快,我尝试了以下方法:
new_d1= d1.groupby(['code','Date'], as_index=False).ffill().bfill()
这不能按预期工作(如上面的代码一样)。看起来好像我们只是按日期分组而不是"代码"。这是输出:
>>> new_d1
Out[59]:
Date sym Last M1 M2 dist code
52735 2017-11-23 C 0.10 4.72 -9.27 677.93 4250 - 12/15/2017
52736 2017-11-23 P 684.20 1.43 -106.09 677.93 4250 - 12/15/2017
53144 2017-11-23 C 0.10 4.49 -9.37 727.93 4300 - 12/15/2017
53145 2017-11-23 P 734.20 0.69 -105.02 727.93 4300 - 12/15/2017
52738 2017-11-23 P 784.20 4.29 -9.46 777.93 4350 - 12/15/2017
52737 2017-11-23 C 0.10 4.29 -9.46 777.93 4350 - 12/15/2017
53081 2017-11-23 P 834.20 4.12 -9.55 827.93 4400 - 12/15/2017
53019 2017-11-23 C 0.10 4.12 -9.55 827.93 4400 - 12/15/2017
52747 2017-11-23 C 0.10 3.96 -9.64 877.93 4450 - 12/15/2017
52748 2017-11-23 P 884.20 3.96 -9.64 877.93 4450 - 12/15/2017
52605 2017-11-23 C 0.10 3.81 -9.71 927.93 4500 - 12/15/2017
52606 2017-11-23 P 934.20 3.81 -9.71 927.93 4500 - 12/15/2017
52753 2017-11-23 C 0.10 3.68 -9.79 977.93 4550 - 12/15/2017
52754 2017-11-23 P 984.30 2.04 -109.96 977.93 4550 - 12/15/2017
53020 2017-11-23 C 0.10 3.56 -9.86 1027.93 4600 - 12/15/2017
53082 2017-11-23 P 1034.30 1.55 -108.99 1027.93 4600 - 12/15/2017
54698 2017-11-23 P 1134.30 0.53 -106.79 1127.93 4700 - 12/15/2017
54687 2017-11-23 C 0.10 3.35 -9.99 1127.93 4700 - 12/15/2017
52337 2017-11-23 C 0.10 3.17 -10.11 1227.93 4800 - 12/15/2017
52338 2017-11-23 P 1234.30 3.17 -10.11 1227.93 4800 - 12/15/2017
54699 2017-11-23 P 1334.30 3.01 -10.22 1327.93 4900 - 12/15/2017
54688 2017-11-23 C 0.10 3.01 -10.22 1327.93 4900 - 12/15/2017
52191 2017-11-23 P 0.10 0.55 -11.15 -3072.07 500 - 12/15/2017
52190 2017-11-23 C 3066.80 0.29 82.60 -3072.07 500 - 12/15/2017
52339 2017-11-23 C 0.10 2.87 -10.32 1427.93 5000 - 12/15/2017
52340 2017-11-23 P 1434.40 1.26 -110.86 1427.93 5000 - 12/15/2017
54689 2017-11-23 C 0.10 2.75 -10.41 1527.93 5100 - 12/15/2017
54700 2017-11-23 P 1534.40 0.45 -108.55 1527.93 5100 - 12/15/2017
52341 2017-11-23 C 0.10 2.65 -10.50 1627.93 5200 - 12/15/2017
52342 2017-11-23 P 1634.40 2.65 -10.50 1627.93 5200 - 12/15/2017
52439 2017-11-23 C 0.10 2.55 -10.58 1727.93 5300 - 12/15/2017
52440 2017-11-23 P 1734.50 1.72 -114.79 1727.93 5300 - 12/15/2017
52343 2017-11-23 C 0.10 2.46 -10.66 1827.93 5400 - 12/15/2017
52344 2017-11-23 P 1834.50 1.08 -112.69 1827.93 5400 - 12/15/2017
54701 2017-11-23 P 1934.50 0.40 -110.30 1927.93 5500 - 12/15/2017
54690 2017-11-23 C 0.10 2.38 -10.73 1927.93 5500 - 12/15/2017
52346 2017-11-23 P 2034.50 2.31 -10.80 2027.93 5600 - 12/15/2017
52345 2017-11-23 C 0.10 2.31 -10.80 2027.93 5600 - 12/15/2017
54691 2017-11-23 C 0.10 2.24 -10.87 2127.93 5700 - 12/15/2017
54702 2017-11-23 P 2134.60 1.52 -116.68 2127.93 5700 - 12/15/2017
52348 2017-11-23 P 2234.60 0.97 -114.51 2227.93 5800 - 12/15/2017
52347 2017-11-23 C 0.10 2.18 -10.93 2227.93 5800 - 12/15/2017
54703 2017-11-23 P 2334.60 0.37 -112.06 2327.93 5900 - 12/15/2017
54692 2017-11-23 C 0.10 2.13 -10.99 2327.93 5900 - 12/15/2017
52192 2017-11-23 C 2966.80 0.46 80.38 -2972.07 600 - 12/15/2017
52193 2017-11-23 P 0.10 0.61 -11.16 -2972.07 600 - 12/15/2017
52349 2017-11-23 C 0.10 2.08 -11.05 2427.93 6000 - 12/15/2017
52350 2017-11-23 P 2434.60 2.08 -11.05 2427.93 6000 - 12/15/2017
52194 2017-11-23 C 2866.70 0.67 -11.16 -2872.07 700 - 12/15/2017
52195 2017-11-23 P 0.10 0.67 -11.16 -2872.07 700 - 12/15/2017
54449 2017-11-23 C 0.10 1.71 -11.52 3427.93 7000 - 12/15/2017
54479 2017-11-23 P 3434.90 0.77 -119.84 3427.93 7000 - 12/15/2017
57740 2017-11-24 C 787.75 7.77 -25.05 -781.23 2800 - 11/24/2017
57742 2017-11-24 P 0.01 7.77 -25.05 -781.23 2800 - 11/24/2017
57741 2017-11-24 C 737.75 7.77 -25.05 -731.23 2850 - 11/24/2017
57743 2017-11-24 P 0.01 7.77 -25.05 -731.23 2850 - 11/24/2017
57730 2017-11-24 C 687.75 7.77 -25.05 -681.23 2900 - 11/24/2017
57735 2017-11-24 P 0.01 7.77 -25.05 -681.23 2900 - 11/24/2017
57731 2017-11-24 C 637.75 7.77 -25.05 -631.23 2950 - 11/24/2017
57736 2017-11-24 P 0.01 7.77 -25.05 -631.23 2950 - 11/24/2017
57732 2017-11-24 C 587.75 7.77 -25.05 -581.23 3000 - 11/24/2017
57737 2017-11-24 P 0.01 7.77 -25.05 -581.23 3000 - 11/24/2017
57733 2017-11-24 C 537.75 7.77 -25.05 -531.23 3050 - 11/24/2017
57738 2017-11-24 P 0.01 7.77 -25.05 -531.23 3050 - 11/24/2017
57727 2017-11-24 P 0.20 7.77 -25.05 -431.23 3150 - 12/08/2017
57728 2017-11-24 P 0.30 11.49 -34.45 -381.23 3200 - 12/08/2017
57734 2017-11-24 C 362.75 14.84 -43.17 -356.23 3225 - 11/24/2017
57739 2017-11-24 P 0.01 14.84 -43.17 -356.23 3225 - 11/24/2017
57729 2017-11-24 P 0.40 14.84 -43.17 -356.23 3225 - 12/08/2017
57826 2017-11-24 C 234.50 140.14 -124.53 -231.23 3350 - 12/22/2017
57845 2017-11-24 P 5.70 140.19 -156.23 -231.23 3350 - 12/22/2017
57827 2017-11-24 C 210.50 160.38 -138.61 -206.23 3375 - 12/22/2017
57846 2017-11-24 P 6.70 160.34 -170.27 -206.23 3375 - 12/22/2017
57828 2017-11-24 C 186.80 184.35 -154.72 -181.23 3400 - 12/22/2017
57847 2017-11-24 P 8.10 185.20 -187.99 -181.23 3400 - 12/22/2017
57829 2017-11-24 C 163.60 213.17 -174.17 -156.23 3425 - 12/22/2017
57848 2017-11-24 P 9.80 213.01 -205.82 -156.23 3425 - 12/22/2017
有没有办法加速上面的代码或任何有关第二个代码无法工作的见解。
答案 0 :(得分:3)
问题发生在第二个bfill
(它将为整个数据帧而不是每个子组重新填充nan),下面将为你工作
df.groupby(['code','Date']).apply(lambda x : x.ffill().bfill())
例如,我们通常认为这将返回每个组的总和,但它将返回一个数字。
df=pd.DataFrame({'A':[1,1,3,4],'B':[2,3,4,5]})
df.groupby('A').sum().sum()
Out[958]:
B 14
dtype: int64