我正在尝试制作一个数据框,其中包含每日库存数量的历史数据,以及它们各自的Nifty 50指数的上升和下降。 作为python的新手,我无法处理pandas数据帧和条件。
下面是我编写的代码,但输出的列是错误的:
import datetime
from datetime import date, timedelta
import nsepy as ns
from nsepy.derivatives import get_expiry_date
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#setting default dates
end_date = date.today()
start_date = end_date - timedelta(365)
#Deriving the names of 50 stocks in Nifty 50 Index
nifty_50 = pd.read_html('https://en.wikipedia.org/wiki/NIFTY_50')
nifty50_symbols = nifty_50[1][1]
results = []
for x in nifty50_symbols:
data = ns.get_history(symbol = x, start=start_date, end=end_date)
results.append(data)
df = pd.concat(results)
output = []
for x in df.index:
Dates = df[df.index == x]
adv = 0
dec = 0
net = 0
advol = 0
devol = 0
netvol = 0
for s in Dates['Symbol']:
y = Dates[Dates['Symbol'] == s]
#print(y.loc[x,'Close'])
cclose = y.loc[x,'Close']
#print(cclose)
copen = y.loc[x,'Open']
#print(copen)
cvol = y.loc[x,'Volume']
if cclose > copen:
adv = adv + 1
advol = advol + cvol
elif copen > cclose:
dec = dec + 1
devol = devol + cvol
else:
net = net + 1
netvol = netvol + cvol
data = [x,adv,dec,advol,devol]
output.append(data)
final = pd.DataFrame(output, columns = {'Date','Advance','Decline','Adv_Volume','Dec_Volume'})
print(final)
输出:
Dec_Volume Adv_Volume Date Decline Advance
0 2017-02-06 27 23 88546029 70663663
1 2017-02-07 15 35 53775268 127004815
2 2017-02-08 27 23 76150502 96895043
3 2017-02-09 20 30 48815099 121956144
4 2017-02-10 19 31 47713187 156262469
5 2017-02-13 23 27 78460358 86575050
6 2017-02-14 15 35 65543372 100474945
7 2017-02-15 13 37 35055563 160091302
8 2017-02-16 35 15 114283658 73082870
9 2017-02-17 22 28 91383781 193246678
10 2017-02-20 34 16 100148171 54036281
11 2017-02-21 29 21 87434834 75182662
12 2017-02-22 13 37 77086733 148499613
13 2017-02-23 20 29 104469151 192787014
14 2017-02-27 13 37 41823692 140518994
15 2017-02-28 21 29 76949655 142799485
从输出中可以看出,列名与它们下面的数据不匹配。为什么会发生这种情况?如何解决?
如果我在一系列循环结束后打印输出列表的值,那么数据看起来就像我想要的那样(就像我这样的新手可以看到)。当我将输出列表转换为DataFrame时,问题就出现了。
答案 0 :(得分:0)
我认为解决方案只是将列名称作为Python列表(使用[]
)传递,该列表具有明确定义的元素顺序,而不是作为无序的集合({}
)元素:
final = pd.DataFrame(output, columns = ['Date','Advance','Decline','Adv_Volume','Dec_Volume'])