数据框的多个过滤条件

时间:2020-01-15 21:37:51

标签: python arrays dataframe filter index-error

感谢@James,我有一个有效的过滤条件。 现在,我尝试创建一个脚本,该脚本将允许我上传一个数据文件,其中包含股票报价和发行日期的列表。 因此,我想遍历过滤器功能上的列表。 但是,我不断得到

IndexError: index 0 is out of bounds for axis 0 with size 0

如何解决此问题? 参见下面的脚本和功能:

过滤功能:

import pandas as pd
from datetime import datetime
import urllib
import datetime

def get_data(issue_date, stock_ticker):
    df = pd.read_csv (r'D:\Project\Data\Short_Interest\mergedshort.csv')
    df['Date'] = pd.to_datetime(df['Date'], format="%Y%m%d")
    d = df

    df = pd.DataFrame(d)
    short = df.loc[df.Symbol.eq(stock_ticker)]
    # get the index of the row of interest
    ix = short[short.Date.eq(issue_date)].index[0]
    # get the item row for that row's index
    iloc_ix = short.index.get_loc(ix)
    # get the +/-1 iloc rows (+2 because that is how slices work), basically +1 and -1 trading days
    short_data = short.iloc[iloc_ix-10: iloc_ix+11]
    return [short_data]

和脚本进行迭代并上传列表(其中包含“ issue_dates”和“ stock_tickers”的列表)

import shortdatafilterfinal
import csv
import tkinter as tk
from tkinter import filedialog

# SHORT DATA trading day time series
# Open File Dialog
# iterates the stock tickers and respective dates over the filter function

root = tk.Tk()
root.withdraw()

file_path = filedialog.askopenfilename()

# Load Spreadsheet data
f = open(file_path)

csv_f = csv.reader(f)
next(csv_f)

result_data = []

# Iterate
for row in csv_f:
    try:
       return_data = shortdatafilterfinal.get_data(row[1], row[0])
       if len(return_data) != 0:
          # print(return_data)
          result_data_loc = [row[1], row[0]]
          result_data_loc.extend(return_data)
          result_data.append(result_data_loc)
    except AttributeError:
          print(row[0])
          print('\n\n')
          print(row[1])
          continue

if result_data is not None:
    with open('resultsshort.csv', mode='w', newline='') as result_file:
        csv_writer = csv.writer(result_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        for result in result_data:
            # print(result)
            csv_writer.writerow(result)
else:
    print("No results found!")

2 个答案:

答案 0 :(得分:2)

如果我理解正确,您想从所选日期中选择+/- 1行:

short = df.loc[df['Symbol'] == 'ARAY']

def get_date(df, d):
    v = short['Date']==d
    return df[v | v.shift(fill_value=False) | v.shift(-1,fill_value=False)]

print(get_date(short, '2011-01-08'))

打印:

         Date Symbol
3  2011-01-06   ARAY
6  2011-01-08   ARAY
9  2011-01-12   ARAY

答案 1 :(得分:1)

您可以使用.iloc来查看过滤结果的项目索引。

import pandas as pd

d = {'Date':['2011-01-03', '2011-01-03', '2011-01-03','2011-01-06', '2011-01-06', 
             '2011-01-06', '2011-01-08', '2011-01-08','2011-01-08', '2011-01-12', 
             '2011-01-12', '2011-01-12'], 
     'Symbol':['ARAY', 'POLA', 'AMRI', 'ARAY', 'POLA', 'AMRI', 'ARAY', 'POLA', 
               'AMRI', 'ARAY', 'POLA', 'AMRI']}
df = pd.DataFrame(d)

def look_around(df, symbol, date):
    short = df.loc[df.Symbol.eq(symbol)]
    # get the index of the row of interest
    ix = short[short.Date.eq(date)].index[0]
    # get the item row for that row's index
    iloc_ix = short.index.get_loc(ix)
    # get the +/-1 iloc rows (you have to use +2 because that is how slices work)
    return short.iloc[iloc_ix-1: iloc_ix+2]

look_around(df, 'ARAY', '2011-01-08')
# returns:
        Date Symbol
3 2011-01-06   ARAY
6 2011-01-08   ARAY
9 2011-01-12   ARAY