DataFrame按组计算每个股票的日志退货

时间:2018-04-03 11:48:24

标签: python python-3.x pandas dataframe

例如,我创建了一个如下所示的数据框:

         date  price ticker  volume
0  2018-01-01  1.323     AI    2000
1  2018-01-02  1.525     AI    1500
2  2018-01-03  1.045     AI     500
3  2018-01-01  2.110    BOC    3201
4  2018-01-02  2.150    BOC    5200
5  2018-01-03  2.810    BOC    1980
6  2018-01-01  5.199    CAT    2000
7  2018-01-02  4.980    CAT     450
8  2018-01-03  4.990    CAT    3000

所以有3只股票并且跨越三天。我想计算2018-01-01和2018-01-03之间每只股票的每日日志回报。

我目前的代码是:

df["logret"] = df.groupby("ticker").apply(np.log(df.price) - np.log(df.price.shift(1)))

但它给我一个错误信息,即系列对象是可变的,因此它们不能被散列。

有人可以向我解释这个错误指向的是什么?如何解决它能够通过每个股票的股票名称来计算日志回报?

4 个答案:

答案 0 :(得分:7)

groupby然后diffdf.assign(logret=np.log(df.price).groupby(df.ticker).diff()) date price ticker volume logret 0 2018-01-01 1.323 AI 2000 NaN 1 2018-01-02 1.525 AI 1500 0.142093 2 2018-01-03 1.045 AI 500 -0.377978 3 2018-01-01 2.110 BOC 3201 NaN 4 2018-01-02 2.150 BOC 5200 0.018780 5 2018-01-03 2.810 BOC 1980 0.267717 6 2018-01-01 5.199 CAT 2000 NaN 7 2018-01-02 4.980 CAT 450 -0.043036 8 2018-01-03 4.990 CAT 3000 0.002006

package API;
import Controllers.Computed_IndicatorController;
import Controllers.DB_Connection_Factory;
import Controllers.DatasetController;
import Controllers.MessageController;
import Controllers.UserController;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.Vector;
import org.apache.http.auth.UsernamePasswordCredentials;

import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.json.JSONObject;

import Entities.Computed_Indicator;
import Entities.Dataset;
import Entities.Message;
import Entities.User;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.ZonedDateTime;
import java.util.Calendar;
import java.util.Date;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.http.impl.auth.BasicScheme;
import org.apache.http.util.EntityUtils;
import org.json.JSONException;

public class GetTweets1 {

private static final String USER_AGENT = "Mozilla/5.0";
    public GetTweets1(){}





    public static void RetrieveAndStoreMessages() throws IOException, JSONException, SQLException{ 



            String url =  "...my URL...";
            HttpClient client = new DefaultHttpClient();
            HttpGet request = new HttpGet(url);  
            //request.addHeader("User-Agent", USER_AGENT);
            request.addHeader(BasicScheme.authenticate(new UsernamePasswordCredentials("account", "password"), "UTF-8", false));//("account", "password"), "UTF-8", false));
            //request.addHeader("Accept" ,"application/json; charset=utf-8");
            HttpResponse response = client.execute(request);  


    }    



    public static void main (String args[]) throws Exception{

        RetrieveAndStoreMessages();
   }            
}   

答案 1 :(得分:5)

我认为需要lambda功能:

df["logret"] = df.groupby("ticker")['price'].apply(lambda x: np.log(x) - np.log(x.shift()))
print (df)
         date  price ticker  volume    logret
0  2018-01-01  1.323     AI    2000       NaN
1  2018-01-02  1.525     AI    1500  0.142093
2  2018-01-03  1.045     AI     500 -0.377978
3  2018-01-01  2.110    BOC    3201       NaN
4  2018-01-02  2.150    BOC    5200  0.018780
5  2018-01-03  2.810    BOC    1980  0.267717
6  2018-01-01  5.199    CAT    2000       NaN
7  2018-01-02  4.980    CAT     450 -0.043036
8  2018-01-03  4.990    CAT    3000  0.002006

答案 2 :(得分:4)

您可以通过矢量化方法执行此计算:

res = df.sort_values(['ticker', 'date'])

res.loc[res['ticker'] == res['ticker'].shift(), 'logret'] = \
np.log(df['price']) - np.log(df['price'].shift())

<强>结果

         date  price ticker  volume    logret
0  2018-01-01  1.323     AI    2000       NaN
1  2018-01-02  1.525     AI    1500  0.142093
2  2018-01-03  1.045     AI     500 -0.377978
3  2018-01-01  2.110    BOC    3201       NaN
4  2018-01-02  2.150    BOC    5200  0.018780
5  2018-01-03  2.810    BOC    1980  0.267717
6  2018-01-01  5.199    CAT    2000       NaN
7  2018-01-02  4.980    CAT     450 -0.043036
8  2018-01-03  4.990    CAT    3000  0.002006

<强>解释

  • 首先按tickerdate对您的数据框进行排序。
  • 然后在连续的行具有相同的ticker
  • 时应用您的计算
  • Vectorising比通过lambda一次计算结果更有效。

答案 3 :(得分:4)

我会做pct_change导致日志(a)-log(b)= log(a / b)

np.log(df.groupby('ticker').price.pct_change().add(1))
Out[729]: 
0         NaN
1    0.142093
2   -0.377978
3         NaN
4    0.018780
5    0.267717
6         NaN
7   -0.043036
8    0.002006
Name: price, dtype: float64