输出列与数据不匹配

时间:2018-02-05 17:15:59

标签: python-3.x pandas

我正在尝试制作一个数据框,其中包含每日库存数量的历史数据,以及它们各自的Nifty 50指数的上升和下降。 作为python的新手,我无法处理pandas数据帧和条件。

下面是我编写的代码,但输出的列是错误的:

import datetime
from datetime import date, timedelta
import nsepy as ns
from nsepy.derivatives import get_expiry_date
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


#setting default dates
end_date = date.today()
start_date = end_date - timedelta(365)

#Deriving the names of 50 stocks in Nifty 50 Index
nifty_50 = pd.read_html('https://en.wikipedia.org/wiki/NIFTY_50')

nifty50_symbols = nifty_50[1][1]



results = []
for x in nifty50_symbols:
    data = ns.get_history(symbol = x, start=start_date, end=end_date)
    results.append(data)

df = pd.concat(results)
output = []
for x in df.index:
    Dates = df[df.index == x]
    adv = 0
    dec = 0
    net = 0
    advol = 0
    devol = 0
    netvol = 0

    for s in Dates['Symbol']:
        y = Dates[Dates['Symbol'] == s]
        #print(y.loc[x,'Close'])
        cclose = y.loc[x,'Close']
        #print(cclose)
        copen = y.loc[x,'Open']
        #print(copen)
        cvol = y.loc[x,'Volume']
        if cclose > copen:
            adv = adv + 1
            advol = advol + cvol

        elif copen > cclose:
            dec = dec + 1
            devol = devol + cvol

        else:
            net = net + 1
            netvol = netvol + cvol

    data = [x,adv,dec,advol,devol]
    output.append(data)

final = pd.DataFrame(output, columns = {'Date','Advance','Decline','Adv_Volume','Dec_Volume'})

print(final)

输出:

       Dec_Volume  Adv_Volume  Date    Decline    Advance
0      2017-02-06          27    23   88546029   70663663
1      2017-02-07          15    35   53775268  127004815
2      2017-02-08          27    23   76150502   96895043
3      2017-02-09          20    30   48815099  121956144
4      2017-02-10          19    31   47713187  156262469
5      2017-02-13          23    27   78460358   86575050
6      2017-02-14          15    35   65543372  100474945
7      2017-02-15          13    37   35055563  160091302
8      2017-02-16          35    15  114283658   73082870
9      2017-02-17          22    28   91383781  193246678
10     2017-02-20          34    16  100148171   54036281
11     2017-02-21          29    21   87434834   75182662
12     2017-02-22          13    37   77086733  148499613
13     2017-02-23          20    29  104469151  192787014
14     2017-02-27          13    37   41823692  140518994
15     2017-02-28          21    29   76949655  142799485

从输出中可以看出,列名与它们下面的数据不匹配。为什么会发生这种情况?如何解决?

如果我在一系列循环结束后打印输出列表的值,那么数据看起来就像我想要的那样(就像我这样的新手可以看到)。当我将输出列表转换为DataFrame时,问题就出现了。

1 个答案:

答案 0 :(得分:0)

我认为解决方案只是将列名称作为Python列表(使用[])传递,该列表具有明确定义的元素顺序,而不是作为无序的集合({})元素:

final = pd.DataFrame(output, columns = ['Date','Advance','Decline','Adv_Volume','Dec_Volume'])