例如,我创建了一个如下所示的数据框:
date price ticker volume
0 2018-01-01 1.323 AI 2000
1 2018-01-02 1.525 AI 1500
2 2018-01-03 1.045 AI 500
3 2018-01-01 2.110 BOC 3201
4 2018-01-02 2.150 BOC 5200
5 2018-01-03 2.810 BOC 1980
6 2018-01-01 5.199 CAT 2000
7 2018-01-02 4.980 CAT 450
8 2018-01-03 4.990 CAT 3000
所以有3只股票并且跨越三天。我想计算2018-01-01和2018-01-03之间每只股票的每日日志回报。
我目前的代码是:
df["logret"] = df.groupby("ticker").apply(np.log(df.price) - np.log(df.price.shift(1)))
但它给我一个错误信息,即系列对象是可变的,因此它们不能被散列。
有人可以向我解释这个错误指向的是什么?如何解决它能够通过每个股票的股票名称来计算日志回报?
答案 0 :(得分:7)
groupby
然后diff
与df.assign(logret=np.log(df.price).groupby(df.ticker).diff())
date price ticker volume logret
0 2018-01-01 1.323 AI 2000 NaN
1 2018-01-02 1.525 AI 1500 0.142093
2 2018-01-03 1.045 AI 500 -0.377978
3 2018-01-01 2.110 BOC 3201 NaN
4 2018-01-02 2.150 BOC 5200 0.018780
5 2018-01-03 2.810 BOC 1980 0.267717
6 2018-01-01 5.199 CAT 2000 NaN
7 2018-01-02 4.980 CAT 450 -0.043036
8 2018-01-03 4.990 CAT 3000 0.002006
package API;
import Controllers.Computed_IndicatorController;
import Controllers.DB_Connection_Factory;
import Controllers.DatasetController;
import Controllers.MessageController;
import Controllers.UserController;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.Vector;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.json.JSONObject;
import Entities.Computed_Indicator;
import Entities.Dataset;
import Entities.Message;
import Entities.User;
import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.time.ZonedDateTime;
import java.util.Calendar;
import java.util.Date;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.http.impl.auth.BasicScheme;
import org.apache.http.util.EntityUtils;
import org.json.JSONException;
public class GetTweets1 {
private static final String USER_AGENT = "Mozilla/5.0";
public GetTweets1(){}
public static void RetrieveAndStoreMessages() throws IOException, JSONException, SQLException{
String url = "...my URL...";
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
//request.addHeader("User-Agent", USER_AGENT);
request.addHeader(BasicScheme.authenticate(new UsernamePasswordCredentials("account", "password"), "UTF-8", false));//("account", "password"), "UTF-8", false));
//request.addHeader("Accept" ,"application/json; charset=utf-8");
HttpResponse response = client.execute(request);
}
public static void main (String args[]) throws Exception{
RetrieveAndStoreMessages();
}
}
答案 1 :(得分:5)
我认为需要lambda
功能:
df["logret"] = df.groupby("ticker")['price'].apply(lambda x: np.log(x) - np.log(x.shift()))
print (df)
date price ticker volume logret
0 2018-01-01 1.323 AI 2000 NaN
1 2018-01-02 1.525 AI 1500 0.142093
2 2018-01-03 1.045 AI 500 -0.377978
3 2018-01-01 2.110 BOC 3201 NaN
4 2018-01-02 2.150 BOC 5200 0.018780
5 2018-01-03 2.810 BOC 1980 0.267717
6 2018-01-01 5.199 CAT 2000 NaN
7 2018-01-02 4.980 CAT 450 -0.043036
8 2018-01-03 4.990 CAT 3000 0.002006
答案 2 :(得分:4)
您可以通过矢量化方法执行此计算:
res = df.sort_values(['ticker', 'date'])
res.loc[res['ticker'] == res['ticker'].shift(), 'logret'] = \
np.log(df['price']) - np.log(df['price'].shift())
<强>结果强>
date price ticker volume logret
0 2018-01-01 1.323 AI 2000 NaN
1 2018-01-02 1.525 AI 1500 0.142093
2 2018-01-03 1.045 AI 500 -0.377978
3 2018-01-01 2.110 BOC 3201 NaN
4 2018-01-02 2.150 BOC 5200 0.018780
5 2018-01-03 2.810 BOC 1980 0.267717
6 2018-01-01 5.199 CAT 2000 NaN
7 2018-01-02 4.980 CAT 450 -0.043036
8 2018-01-03 4.990 CAT 3000 0.002006
<强>解释强>
ticker
和date
对您的数据框进行排序。ticker
。lambda
一次计算结果更有效。答案 3 :(得分:4)
我会做pct_change
导致日志(a)-log(b)= log(a / b)
np.log(df.groupby('ticker').price.pct_change().add(1))
Out[729]:
0 NaN
1 0.142093
2 -0.377978
3 NaN
4 0.018780
5 0.267717
6 NaN
7 -0.043036
8 0.002006
Name: price, dtype: float64