我有两个带有财务数据的csv文件,我需要将其分类到第三个csv文件,财务数据需要与日期对应,这意味着我需要在特定日期的每个金融工具的价格。
csv1数据
Open High Low Last Change Settle Volume Open Interest
Date
1974-12-31 191.0 191.5 182.7 183.9 NaN 183.9 512.0 237.0
1975-01-02 184.0 184.8 173.9 175.1 NaN 175.1 294.0 209.0
1975-01-03 173.0 175.5 170.5 174.7 NaN 174.7 174.0 216.0
1975-01-06 172.0 174.5 167.5 174.4 NaN 174.4 197.0 225.0
1975-01-07 171.0 174.0 168.5 173.4 NaN 173.4 98.0 240.0
csv2数据
Open High Low Last Change Settle Volume Open Interest
Date
1997-09-09 934.0 942.0 933.0 934.0 NaN 934.0 7034.0 1109.0
1997-09-10 934.0 935.0 915.0 915.0 NaN 915.0 11387.0 2325.0
1997-09-11 916.0 918.0 900.0 908.0 NaN 908.0 2523.0 2549.0
1997-09-12 908.0 926.0 904.0 924.0 NaN 924.0 928.0 2163.0
1997-09-15 925.0 930.0 920.0 922.0 NaN 922.0 208.0 2107.0
我遇到的第一个问题是csv1上的日期始于1975年,而csv2发生在1997年,所以我需要消除csv1中的额外日期。
第二个问题是日期与文件中的日期不完全匹配
csv1
Open High Low Last Change Settle Volume Open Interest
Date
1997-09-08 191.0 191.5 182.7 183.9 NaN 183.9 512.0 237.0
1997-09-09 184.0 184.8 173.9 175.1 NaN 175.1 294.0 209.0
1997-09-10 173.0 175.5 170.5 174.7 NaN 174.7 174.0 216.0*******
1997-09-11 172.0 174.5 167.5 174.4 NaN 174.4 197.0 225.0
1997-09-12 171.0 174.0 168.5 173.4 NaN 173.4 98.0 240.0
日期1997-09-10在csv2文件中不存在,因此应该在csv1文件中删除1997-09-10的日期
CSV2
Open High Low Last Change Settle Volume Open Interest
Date
1997-09-08 934.0 942.0 933.0 934.0 NaN 934.0 7034.0 1109.0
1997-09-09 934.0 935.0 915.0 915.0 NaN 915.0 11387.0 2325.0
1997-09-11 916.0 918.0 900.0 908.0 NaN 908.0 2523.0 2549.0
1997-09-12 908.0 926.0 904.0 924.0 NaN 924.0 928.0 2163.0
1997-09-13 925.0 930.0 920.0 922.0 NaN 922.0 208.0 2107.0
输出应该如下所示(我已经删除了Change,Settle,Volume和Open Interest列,因此表格可以正确匹配,这不应该出现在代码中)
csv3
Open High Low Last Open High Low Last
Date
1997-09-08 191.0 191.5 182.7 183.9 934.0 942.0 933.0 934.0
1997-09-09 184.0 184.8 173.9 175.1 934.0 935.0 915.0 915.0
1997-09-11 172.0 174.5 167.5 174.4 916.0 918.0 900.0 908.0
1997-09-12 171.0 174.0 168.5 173.4 925.0 930.0 920.0 922.0
这是我目前为止的数据代码和数据来源
PS for python:)
from pandas import ExcelWriter
import pandas as pd
import quandl
import unicodecsv
import datetime as dt
#reading in the csv files
def read_csv(filename):
with open(filename, 'rb') as f:
reader = unicodecsv.DictReader(f)
return list(reader)
#data for the SnP https://www.quandl.com/data/CHRIS/CME_ES1-E-mini-S-P-500-Futures-Continuous-Contract-1-ES1-Front-Month
#data for the Gld https://www.quandl.com/data/CHRIS/CME_GC1-Gold-Futures-Continuous-Contract-1-GC1-Front-Month
SnP = read_csv('C:/Users/L/Desktop/python/CHRIS-CME_ES1.csv')
Gld = read_csv('C:/Users/L/Desktop/python/CHRIS-CME_GC1.csv')
financialInstruments = [SnP, Gld]
#parsing the date into datetime
def parse_date(date):
if date == '':
return None
else:
return dt.datetime.strptime(date, '%Y-%m-%d')
#converting strings(numbers) into floats
def stock_data(data):
if data == '' or data == 'NaN':
return None
else:
return float(data)
#looping through financial data for parsing
def define_data(finInst):
for data in finInst:
data['Date'] = parse_date(data['Date'])
data['Volume'] = stock_data(data['Volume'])
data['Open'] = stock_data(data['Open'])
data['High'] = stock_data(data['High'])
data['Low'] = stock_data(data['Low'])
data['Last'] = stock_data(data['Last'])
data['Change'] = stock_data(data['Change'])
data['Settle'] = stock_data(data['Settle'])
data['Open Interest'] = stock_data(data['Open Interest'])
#looping through financial instruments and forwarding to define_data function
for symbol in financialInstruments:
symbol = define_data(symbol)
print (SnP[0])
print (Gld[0])
答案 0 :(得分:1)
这是一个经典的匹配合并。您可以使用pandas中的pd.merge。 在这种情况下,您将在列日期进行内部联接。 内连接意味着:在结果表中只有两个输入表中的日期。