熊猫:如何从多索引数据框中获取特定列

时间:2020-04-18 20:18:51

标签: python-3.x pandas dataframe quantitative-finance

我有几只股票的开盘价,最高价,最低价,收盘价,成交量的Pandas数据框。

我只想为每个股票行情收录一个收盘价,并为此创建另一个单独的数据框-努力克服多索引语法和理解;任何帮助将不胜感激!我想保持Data DataFrame不变,例如CandleStick图表。

import ...
tickers = ['AAPL', 'MSFT', 'INTC', 'AMZN', 'GS', '^GSPC', 'SPY', '^VIX']
data = yf.download(tickers=tickers, start='2010-01-01', end='2020-01-01',
               interval='1d',
               group_by='ticker',
               auto_adjust=True,  # auto adjusts OHLC
               prepost=True,  # download pre/post market hours data
               threads=True,  # use threads for mass downloading?
               proxy=None
               )

非常感谢,

在另一个注释上,如您在Excel输出中所见,日期索引包含时间戳“ 00:00:00”-是否要删除DataFrame中和/或Excel输出中的时间戳? -无需花费太多时间为之担心,只需思考一下即可。

Excel Representation of first 15 rows and some of the stocks

2 个答案:

答案 0 :(得分:0)

使用高级xs method从MultiIndex的更深层次中进行选择。

data.xs('Close', level=1, axis=1)
#                    AMZN       ^VIX         SPY        ^GSPC        MSFT          GS       INTC        AAPL
# Date                                                                                                      
# 2010-01-04   133.899994  20.040001   92.246048  1132.989990   24.294369  149.746597  15.251445   26.538483
# 2010-01-05   134.690002  19.350000   92.490204  1136.520020   24.302216  152.394012  15.244140   26.584366
# 2010-01-06   132.250000  19.160000   92.555328  1137.140015   24.153070  150.767426  15.193007   26.161509
# 2010-01-07   130.000000  19.059999   92.946060  1141.689941   23.901886  153.717728  15.046927   26.113146
# 2010-01-08   133.520004  18.129999   93.255348  1144.979980   24.066734  150.810715  15.214921   26.286753
# ...                 ...        ...         ...          ...         ...         ...        ...         ...
# 2019-12-24  1789.209961  12.670000  319.352142  3223.379883  156.951309  228.512817  59.118862  283.596924
# 2019-12-26  1868.770020  12.650000  321.052124  3239.909912  158.237793  229.804916  59.526852  289.223602
# 2019-12-27  1869.800049  13.430000  320.972565  3240.020020  158.527008  229.258255  59.785580  289.113831
# 2019-12-30  1846.890015  14.820000  319.202972  3221.290039  157.160736  228.403488  59.327831  290.829773
# 2019-12-31  1847.839966  13.780000  319.978424  3230.780029  157.270432  228.532684  59.556702  292.954712

答案 1 :(得分:0)

您当然可以在不使用多索引约定的情况下导入股票价格数据,就像这样。

from math import sqrt
from sklearn.cluster import MiniBatchKMeans 
import pandas_datareader as dr
from matplotlib import pyplot as plt
import pandas as pd
import matplotlib.cm as cm
import seaborn as sn

start = '2019-1-1'
end = '2020-1-1'

tickers = ['AXP','AAPL','BA','CAT','CSCO','CVX','XOM','GS','HD','IBM','INTC','JNJ','KO','JPM','MCD',    'MMM',  'MRK',  'MSFT', 'NKE','PFE','PG','TRV','UNH','RTX','VZ','V','WBA','WMT','DIS','DOW']
prices_list = []
for ticker in tickers:
    try:
        prices = dr.DataReader(ticker,'yahoo',start)['Adj Close']
        prices = pd.DataFrame(prices)
        prices.columns = [ticker]
        prices_list.append(prices)
    except:
        pass
    prices_df = pd.concat(prices_list,axis=1)
prices_df.sort_index(inplace=True)
prices_df.head()

结果:

                  AXP       AAPL          BA         CAT       CSCO  \
Date                                                                  
2019-01-02  92.042946  38.505024  314.645142  118.671188  39.772106   
2019-01-03  90.246315  34.669640  302.100555  114.098251  38.325676   
2019-01-04  94.312866  36.149662  317.822601  120.333229  40.052055   
2019-01-07  94.824791  36.069202  318.823395  120.408340  40.322678   
2019-01-08  95.288445  36.756794  330.891937  121.854416  40.649292   

                   CVX        XOM          GS          HD         IBM  \
Date                                                                    
2019-01-02   99.261368  60.557911  163.946991  162.949036  103.081429   
2019-01-03   97.360268  59.628124  161.545410  159.357559  101.023560   
2019-01-04   99.377945  61.826595  166.825119  164.092636  104.969307   
2019-01-07  100.669258  62.148109  167.749557  167.324936  105.711922   
2019-01-08  100.229866  62.599968  167.130081  168.128311  107.215065   

有关详细信息,请参阅下面的链接。

https://github.com/ASH-WICUS/Notebooks/blob/master/Clustering%20-%20Historical%20Stock%20Prices.ipynb