Python:将列添加到与另一列相关的数据框

时间:2019-01-29 06:21:54

标签: python pandas dataframe

我已使用pandas_datareader将来自Yahoo的股票数据导入数据框。 有两列:日期和股票的调整后收盘价。

Date        Adj Close          
2017-08-31  168.851196
2017-09-01  169.867691
2017-09-05  165.333496
2017-09-06  165.233810
2017-09-07  166.001160
2017-09-08  163.121201
2017-09-11  168.412735
2017-09-12  169.020630
2017-09-13  169.777969
2017-09-14  168.811356
2017-09-15  179.484131
2017-09-18  186.898300
2017-09-19  186.698990
2017-09-20  185.194214
2017-09-21  180.131882
2017-09-22  178.377991
2017-09-25  170.405807
2017-09-26  171.362473
2017-09-27  175.119354
2017-09-28  175.069534
2017-09-29  178.148788
2017-10-02  178.377991
2017-10-03  178.746704
2017-10-04  180.241486
2017-10-05  180.141861
2017-10-06  180.670013
2017-10-09  184.745804
2017-10-10  188.273499
2017-10-11  190.276505
2017-10-12  190.366211

我希望能够插入另一列名为“ Log return”的列,该列采用当前日期的Adj关闭时间(由于交易日,日期不全为1天),并将其除以前几天的Adj Close然后取该商的自然对数

即Ln(A(今天)/ A(昨天)),其中A只是adj关闭。

顺便说一下,我的dataframe变量叫df。

import pandas as pd
import pandas_datareader as web

#import datetime internal datetime module
#datetime is a Python module
import datetime

#datetime.datetime is a data type within the datetime module
start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2018, 12, 31)

#DataReader method name is case sensitive
df = web.DataReader("nvda", 'yahoo', start, end)

#invoke to_csv for df dataframe object from 
#DataReader method in the pandas_datareader library

#..\first_yahoo_prices_to_csv_demo.csv must not
#be open in another app, such as Excel

df = df.iloc[0:, 5:]  #Trims the set to Adj Close

到目前为止,这就是我的代码。 编辑我不希望A(今天)/ A(昨天)-1,实际上我需要Ln(A(今天)/ A(昨天))。 (自然对数)

3 个答案:

答案 0 :(得分:3)

尝试一下:

df['Adj Yesterday'] = df['Adj Close'].shift()
df['Log Return'] = df['Adj Close'] / df['Adj Yesterday'] - 1.

如果这不是您想要的,而是关闭,here is the docs for shift

如果缺少时态数据,也可以将resampleset_indexdate_range一起使用。

答案 1 :(得分:2)

您需要Series.pct_change

df['Log Return'] = df['Adj Close'].pct_change()

如果需要ln

df['Log Return'] = np.log(df['Adj Close'].pct_change())

答案 2 :(得分:1)

您可以尝试以下方法:

# First ensure dates are in order
df = df.sort_values('Date')
# Divide all rows by their previous and find log
diff = np.log(df[1:]['Adj Close'] / df[0:-1]['Adj Close'])
# Add new column, first row will be NaN as it has no previous day
df['Log Return'] = pd.concat(pd.Series([pd.nan]), diff)