Pandas使用复合索引获取数据帧的行数

时间:2017-06-25 12:37:20

标签: python pandas dataframe slice

我有一个目录,其中.csv文件包含60分钟的库存数据,一个Python脚本用于将它们全部加载到pandas数据框中,并对符号和日期时间进行索引,如下所示;

import pandas as pd
import glob
import numpy as np

allFiles = glob.glob("D:\\Data\\60 Min Bar Stocks\\*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
frame = pd.concat(list_)

frame.set_index(['Symbol','Date'],inplace=True)

print(frame.loc["AAL", :])
print(frame.loc["AAL", :].loc["05-Jun-2017 09:00", :])

第一次打印带来以下内容;

                   Open   High    Low   Close   Volume
Date                                                  
05-Jun-2017 09:00  49.53  49.88  49.40  49.64   560155

05-Jun-2017 10:00  49.58  49.89  49.58  49.85   575165

第二次印刷带回以下内容;

Open          49.53
High          49.88
Low           49.40
Close         49.64
Volume    560155.00
Name: 05-Jun-2017 09:00, dtype: float64

如何在数据框中找到此单个行的行索引,然后获取一个切片,该切片将是由前一行,当前行和接下来的10行组成的12行?

1 个答案:

答案 0 :(得分:4)

我认为MultiIndex的位置需要get_loc,然后按iloc选择:

d = '05-Jun-2017 09:00'
s = 'AAL'

pos = df.index.get_loc((s,d))
df1 = df.iloc[pos-1:pos + 11]
print (df1)

但如果t是第一个值或某些10最后一个问题,则会出现问题:

df1 = df.iloc[max(pos-1,0): min(pos+11,len(df.index))]

样品:

print (df)
                            Open    High     Low   Close  Volume
Symbol Date                                                     
AAL    05-Jun-2017 08:00  1.1801  1.1819  1.1801  1.1817       4
       05-Jun-2017 09:00  1.1817  1.1818  1.1804  1.1814      18
       05-Jun-2017 10:00  1.1817  1.1817  1.1802  1.1806      12
       05-Jun-2017 11:00  1.1807  1.1815  1.1795  1.1808      26
       05-Jun-2017 12:00  1.1803  1.1806  1.1790  1.1806       4
       05-Jun-2017 13:00  1.1801  1.1801  1.1779  1.1786      23
       05-Jun-2017 14:00  1.1795  1.1801  1.1776  1.1788      28
       05-Jun-2017 15:00  1.1793  1.1795  1.1782  1.1789      10
       05-Jun-2017 16:00  1.1780  1.1792  1.1776  1.1792      12
       05-Jun-2017 17:00  1.1788  1.1792  1.1788  1.1791       4
d = '05-Jun-2017 09:00'
s = 'AAL'

pos = df.index.get_loc((s,d))
df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))]
print (df1)
                            Open    High     Low   Close  Volume
Symbol Date                                                     
AAL    05-Jun-2017 08:00  1.1801  1.1819  1.1801  1.1817       4
       05-Jun-2017 09:00  1.1817  1.1818  1.1804  1.1814      18
       05-Jun-2017 10:00  1.1817  1.1817  1.1802  1.1806      12
       05-Jun-2017 11:00  1.1807  1.1815  1.1795  1.1808      26
       05-Jun-2017 12:00  1.1803  1.1806  1.1790  1.1806       4
       05-Jun-2017 13:00  1.1801  1.1801  1.1779  1.1786      23
       05-Jun-2017 14:00  1.1795  1.1801  1.1776  1.1788      28
       05-Jun-2017 15:00  1.1793  1.1795  1.1782  1.1789      10
       05-Jun-2017 16:00  1.1780  1.1792  1.1776  1.1792      12
       05-Jun-2017 17:00  1.1788  1.1792  1.1788  1.1791       4

不可能选择previousrow,因为时间戳t是索引时的第一个值:

d = '05-Jun-2017 08:00'
s = 'AAL'

pos = df.index.get_loc((s,d))
df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))]
print (df1)
                            Open    High     Low   Close  Volume
Symbol Date                                                     
AAL    05-Jun-2017 08:00  1.1801  1.1819  1.1801  1.1817       4
       05-Jun-2017 09:00  1.1817  1.1818  1.1804  1.1814      18
       05-Jun-2017 10:00  1.1817  1.1817  1.1802  1.1806      12
       05-Jun-2017 11:00  1.1807  1.1815  1.1795  1.1808      26
       05-Jun-2017 12:00  1.1803  1.1806  1.1790  1.1806       4
       05-Jun-2017 13:00  1.1801  1.1801  1.1779  1.1786      23
       05-Jun-2017 14:00  1.1795  1.1801  1.1776  1.1788      28
       05-Jun-2017 15:00  1.1793  1.1795  1.1782  1.1789      10
       05-Jun-2017 16:00  1.1780  1.1792  1.1776  1.1792      12
       05-Jun-2017 17:00  1.1788  1.1792  1.1788  1.1791       4

无法选择所有10个下一行,因为t后面的值为3.rd

d = '05-Jun-2017 15:00'
s = 'AAL'

pos = df.index.get_loc((s,d))
df1 = df.iloc[max(pos-1,0): min(pos+10,len(df.index))]
print (df1)
                            Open    High     Low   Close  Volume
Symbol Date                                                     
AAL    05-Jun-2017 14:00  1.1795  1.1801  1.1776  1.1788      28
       05-Jun-2017 15:00  1.1793  1.1795  1.1782  1.1789      10
       05-Jun-2017 16:00  1.1780  1.1792  1.1776  1.1792      12
       05-Jun-2017 17:00  1.1788  1.1792  1.1788  1.1791       4