我有几只股票的开盘价,最高价,最低价,收盘价,成交量的Pandas数据框。
我只想为每个股票行情收录一个收盘价,并为此创建另一个单独的数据框-努力克服多索引语法和理解;任何帮助将不胜感激!我想保持Data DataFrame不变,例如CandleStick图表。
import ...
tickers = ['AAPL', 'MSFT', 'INTC', 'AMZN', 'GS', '^GSPC', 'SPY', '^VIX']
data = yf.download(tickers=tickers, start='2010-01-01', end='2020-01-01',
interval='1d',
group_by='ticker',
auto_adjust=True, # auto adjusts OHLC
prepost=True, # download pre/post market hours data
threads=True, # use threads for mass downloading?
proxy=None
)
非常感谢,
在另一个注释上,如您在Excel输出中所见,日期索引包含时间戳“ 00:00:00”-是否要删除DataFrame中和/或Excel输出中的时间戳? -无需花费太多时间为之担心,只需思考一下即可。
Excel Representation of first 15 rows and some of the stocks
答案 0 :(得分:0)
使用高级xs method从MultiIndex的更深层次中进行选择。
data.xs('Close', level=1, axis=1)
# AMZN ^VIX SPY ^GSPC MSFT GS INTC AAPL
# Date
# 2010-01-04 133.899994 20.040001 92.246048 1132.989990 24.294369 149.746597 15.251445 26.538483
# 2010-01-05 134.690002 19.350000 92.490204 1136.520020 24.302216 152.394012 15.244140 26.584366
# 2010-01-06 132.250000 19.160000 92.555328 1137.140015 24.153070 150.767426 15.193007 26.161509
# 2010-01-07 130.000000 19.059999 92.946060 1141.689941 23.901886 153.717728 15.046927 26.113146
# 2010-01-08 133.520004 18.129999 93.255348 1144.979980 24.066734 150.810715 15.214921 26.286753
# ... ... ... ... ... ... ... ... ...
# 2019-12-24 1789.209961 12.670000 319.352142 3223.379883 156.951309 228.512817 59.118862 283.596924
# 2019-12-26 1868.770020 12.650000 321.052124 3239.909912 158.237793 229.804916 59.526852 289.223602
# 2019-12-27 1869.800049 13.430000 320.972565 3240.020020 158.527008 229.258255 59.785580 289.113831
# 2019-12-30 1846.890015 14.820000 319.202972 3221.290039 157.160736 228.403488 59.327831 290.829773
# 2019-12-31 1847.839966 13.780000 319.978424 3230.780029 157.270432 228.532684 59.556702 292.954712
答案 1 :(得分:0)
您当然可以在不使用多索引约定的情况下导入股票价格数据,就像这样。
from math import sqrt
from sklearn.cluster import MiniBatchKMeans
import pandas_datareader as dr
from matplotlib import pyplot as plt
import pandas as pd
import matplotlib.cm as cm
import seaborn as sn
start = '2019-1-1'
end = '2020-1-1'
tickers = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO','JPM','MCD', 'MMM', 'MRK', 'MSFT', 'NKE','PFE','PG','TRV','UNH','RTX','VZ','V','WBA','WMT','DIS','DOW']
prices_list = []
for ticker in tickers:
try:
prices = dr.DataReader(ticker,'yahoo',start)['Adj Close']
prices = pd.DataFrame(prices)
prices.columns = [ticker]
prices_list.append(prices)
except:
pass
prices_df = pd.concat(prices_list,axis=1)
prices_df.sort_index(inplace=True)
prices_df.head()
结果:
AXP AAPL BA CAT CSCO \
Date
2019-01-02 92.042946 38.505024 314.645142 118.671188 39.772106
2019-01-03 90.246315 34.669640 302.100555 114.098251 38.325676
2019-01-04 94.312866 36.149662 317.822601 120.333229 40.052055
2019-01-07 94.824791 36.069202 318.823395 120.408340 40.322678
2019-01-08 95.288445 36.756794 330.891937 121.854416 40.649292
CVX XOM GS HD IBM \
Date
2019-01-02 99.261368 60.557911 163.946991 162.949036 103.081429
2019-01-03 97.360268 59.628124 161.545410 159.357559 101.023560
2019-01-04 99.377945 61.826595 166.825119 164.092636 104.969307
2019-01-07 100.669258 62.148109 167.749557 167.324936 105.711922
2019-01-08 100.229866 62.599968 167.130081 168.128311 107.215065
有关详细信息,请参阅下面的链接。