乘以熊猫数据框

时间:2020-03-31 19:31:00

标签: python pandas dataframe

我有2个结构相同但长度不同的csv文件: a.csv

2000-01-14,50.94,51.04,49.83,49.94,2.18M,-1.61%
2000-01-18,49.18,49.32,48.39,48.53,3.03M,-2.81%
2000-01-19,48.63,49.37,47.7,47.89,2.49M,-1.33%
2000-01-20,47.98,48.03,46.31,47.43,2.46M,-0.96%

和b.csv

2000-01-14,1.0261,1.0273,1.0111,1.0128,-1.23%,
2000-01-17,1.0128,1.0149,1.0069,1.0118,-0.10%,
2000-01-18,1.0123,1.0143,1.0072,1.0131,0.13%,
2000-01-19,1.0139,1.0166,1.0086,1.0122,-0.09%,
2000-01-20,1.0137,1.0189,1.0072,1.0175,0.52%,

我想创建一个新的文件,其中1-4列的值彼此相乘。第0列(日期)应与条目的nr较低的文件中的相同。第5列和第6列可以删除。

我使用以下代码读取了文件:

a = pd.read_csv("a.csv", index_col=[0], parse_dates=[0], infer_datetime_format=True, header=None, skiprows=1, delimiter=',', names=['Date', 'Close','Open', 'High','Low', 'Vol.','Change'])
b = pd.read_csv("b.csv", index_col=[0], parse_dates=[0], infer_datetime_format=True, header=None, skiprows=1, delimiter=',', names=['Date', 'Close','Open', 'High','Low', 'Vol.','Change'])

现在我想我可以使用mul方法相乘,但是 c = a.mul(b, axis = 0)出现错误TypeError: can't multiply sequence by non-int of type str
我读过this SO reply来做
a.Close = (a.Close.values / np.timedelta64(1, 'D')).astype(int)在乘法之前,但无效:TypeError: ufunc divide cannot use operands with types dtype('float64') and dtype('<m8[D]')
做一个简单的c = (a.Close * b.Close)似乎可以某种方式起作用:

Date
2000-01-17          NaN
2000-01-18    49.784914
2000-01-19    49.305957
2000-01-20    48.637326
Name: Close, dtype: float64

但是我不确定结果是否仍然是pandas数据框,以及如何添加其他列。但是我很确定这应该很容易,您能指出我正确的方向吗? 谢谢!

1 个答案:

答案 0 :(得分:0)

多亏了微生物评论我(在谷歌搜索后,不熟悉python)我提出了以下解决方案:

import pandas as pd
a = pd.read_csv("a.csv", index_col=[0], parse_dates=[0], infer_datetime_format=True, header=None, skiprows=1, delimiter=',', names=['Date', 'Close','Open', 'High','Low', 'Vol.','Change'])
b = pd.read_csv("b.csv", index_col=[0], parse_dates=[0], infer_datetime_format=True, header=None, skiprows=1, delimiter=',', names=['Date', 'Close','Open', 'High','Low', 'Vol.','Change'])

c= a.join(b, lsuffix='_a', rsuffix='_b')

# create new columns with multiplied values
c['Close'] = c.Close_a * c.Close_b
c['Open'] = c.Open_a * c.Open_b
c['High'] = c.High_a * c.High_b
c['Low'] = c.Low_a * c.Low_b

# create subset with new data and round
d = c[['Close','Open', 'High', 'Low']]
d = d.round({'Close':2,'Open':2, 'High':2, 'Low':2})
d.to_csv("d.csv", sep=',', header=True)

哪个给出了d.csv:

Date,Close,Open,High,Low
2000-01-18,49.78,50.03,48.74,49.17
2000-01-19,49.31,50.19,48.11,48.47
2000-01-20,48.64,48.94,46.64,48.26

整洁:)非常感谢!