匹配来自2个csv文件的数据并将新生成的数据保存到新的csv

时间:2016-11-16 21:17:54

标签: python csv

我有两个带有财务数据的csv文件,我需要将其分类到第三个csv文件,财务数据需要与日期对应,这意味着我需要在特定日期的每个金融工具的价格。

csv1数据

             Open   High    Low   Last  Change  Settle  Volume  Open Interest
Date                                                                         
1974-12-31  191.0  191.5  182.7  183.9     NaN   183.9   512.0          237.0
1975-01-02  184.0  184.8  173.9  175.1     NaN   175.1   294.0          209.0
1975-01-03  173.0  175.5  170.5  174.7     NaN   174.7   174.0          216.0
1975-01-06  172.0  174.5  167.5  174.4     NaN   174.4   197.0          225.0
1975-01-07  171.0  174.0  168.5  173.4     NaN   173.4    98.0          240.0

csv2数据

             Open   High    Low   Last  Change  Settle   Volume  Open Interest
Date                                                                          
1997-09-09  934.0  942.0  933.0  934.0     NaN   934.0   7034.0         1109.0
1997-09-10  934.0  935.0  915.0  915.0     NaN   915.0  11387.0         2325.0
1997-09-11  916.0  918.0  900.0  908.0     NaN   908.0   2523.0         2549.0
1997-09-12  908.0  926.0  904.0  924.0     NaN   924.0    928.0         2163.0
1997-09-15  925.0  930.0  920.0  922.0     NaN   922.0    208.0         2107.0

我遇到的第一个问题是csv1上的日期始于1975年,而csv2发生在1997年,所以我需要消除csv1中的额外日期。

第二个问题是日期与文件中的日期不完全匹配

csv1

             Open   High    Low   Last  Change  Settle  Volume  Open Interest
Date                                                                         
1997-09-08  191.0  191.5  182.7  183.9     NaN   183.9   512.0          237.0
1997-09-09  184.0  184.8  173.9  175.1     NaN   175.1   294.0          209.0
1997-09-10  173.0  175.5  170.5  174.7     NaN   174.7   174.0          216.0*******
1997-09-11  172.0  174.5  167.5  174.4     NaN   174.4   197.0          225.0
1997-09-12  171.0  174.0  168.5  173.4     NaN   173.4    98.0          240.0

日期1997-09-10在csv2文件中不存在,因此应该在csv1文件中删除1997-09-10的日期

CSV2

             Open   High    Low   Last  Change  Settle   Volume  Open Interest
Date                                                                          
1997-09-08  934.0  942.0  933.0  934.0     NaN   934.0   7034.0         1109.0
1997-09-09  934.0  935.0  915.0  915.0     NaN   915.0  11387.0         2325.0
1997-09-11  916.0  918.0  900.0  908.0     NaN   908.0   2523.0         2549.0
1997-09-12  908.0  926.0  904.0  924.0     NaN   924.0    928.0         2163.0
1997-09-13  925.0  930.0  920.0  922.0     NaN   922.0    208.0         2107.0

输出应该如下所示(我已经删除了Change,Settle,Volume和Open Interest列,因此表格可以正确匹配,这不应该出现在代码中)

csv3

             Open   High    Low   Last   Open   High    Low   Last
Date                                                                         
1997-09-08  191.0  191.5  182.7  183.9   934.0  942.0  933.0  934.0   
1997-09-09  184.0  184.8  173.9  175.1   934.0  935.0  915.0  915.0   
1997-09-11  172.0  174.5  167.5  174.4   916.0  918.0  900.0  908.0  
1997-09-12  171.0  174.0  168.5  173.4   925.0  930.0  920.0  922.0  

这是我目前为止的数据代码和数据来源

PS for python:)

from pandas import ExcelWriter 
import pandas as pd
import quandl
import unicodecsv
import datetime as dt


#reading in the csv files
def read_csv(filename):
    with open(filename, 'rb') as f:
        reader = unicodecsv.DictReader(f)
        return list(reader)

#data for the SnP https://www.quandl.com/data/CHRIS/CME_ES1-E-mini-S-P-500-Futures-Continuous-Contract-1-ES1-Front-Month
#data for the Gld https://www.quandl.com/data/CHRIS/CME_GC1-Gold-Futures-Continuous-Contract-1-GC1-Front-Month
SnP = read_csv('C:/Users/L/Desktop/python/CHRIS-CME_ES1.csv')
Gld = read_csv('C:/Users/L/Desktop/python/CHRIS-CME_GC1.csv')
financialInstruments = [SnP, Gld]

#parsing the date into datetime
def parse_date(date):
    if date == '':
        return None
    else:
        return dt.datetime.strptime(date, '%Y-%m-%d')

#converting strings(numbers) into floats
def stock_data(data):
    if data == '' or data == 'NaN':
        return None
    else:
        return float(data)

#looping through financial data for parsing
def define_data(finInst):
    for data in finInst:
        data['Date'] = parse_date(data['Date'])
        data['Volume'] = stock_data(data['Volume'])
        data['Open'] = stock_data(data['Open'])
        data['High'] = stock_data(data['High'])
        data['Low'] = stock_data(data['Low'])
        data['Last'] = stock_data(data['Last'])
        data['Change'] = stock_data(data['Change'])
        data['Settle'] = stock_data(data['Settle'])
        data['Open Interest'] = stock_data(data['Open Interest'])

#looping through financial instruments and forwarding to define_data function
for symbol in financialInstruments:
    symbol = define_data(symbol)     

print (SnP[0])    
print (Gld[0])

1 个答案:

答案 0 :(得分:1)

这是一个经典的匹配合并。您可以使用pandas中的pd.merge。 在这种情况下,您将在列日期进行内部联接。 内连接意味着:在结果表中只有两个输入表中的日期。