我有一个带有时间戳数据的大型csv文件,格式为2015-04-01 10:26:41
。数据跨越多个月,条目从30秒到数小时不等。它的列是id,时间,速度。
最终,我希望按照15分钟的时间间隔对数据进行分组,然后计算平均速度,但是很多条目都在15分钟的时间段内。
我正在尝试使用Pandas,因为它似乎有一个坚实的时间序列工具,它可能很容易做到这一点,但我在第一个障碍下降。
到目前为止,我已将CSV导入为数据框,并且所有列的dtype均为object
。我按日期对数据进行了排序,现在我正在尝试按时间间隔对条目进行分组,这是我正在努力的地方。基于谷歌搜索,我尝试使用此代码resample
df.resample('5min', how=sum)
数据TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex
我收到错误groupby
。我正在考虑尝试使用lambda
方法,可能使用df.groupby(lambda x:x.minutes + 5)
,因为AttributeError: 'str' object has no attribute 'minutes'
会产生错误dtype
。
基本上我对a)pandas是否有时间序列数据感到困惑,因为它的object
是 0 1 2 3
0 id boat_id time speed
1 386226 32 2015-01-15 05:14:32 4.2343243
2 386285 32 2015-01-15 05:44:57 3.45234
,而b)是否可以识别我似乎无法让时间间隔缩小。
热衷于了解是否有人能指出我正确的方向。
DF看起来像这样
public class AsyncHttpTask extends AsyncTask<String, Void, Integer> {
@Override
protected Integer doInBackground(String... params) {
ArrayList<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>();
nameValuePairs.add(new BasicNameValuePair("imovel_id", i_id));
try {
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost("http://meuwebsite.com/panel/json_images.php");
httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs));
HttpResponse response = httpclient.execute(httppost);
HttpEntity entity = response.getEntity();
inputStream = entity.getContent();
Log.e("pass 1", "connection success ");
} catch (Exception e) {
Log.e("Fail 1", e.toString());
Toast.makeText(getApplicationContext(), "Invalid IP Address",
Toast.LENGTH_LONG).show();
}
try {
BufferedReader reader = new BufferedReader
(new InputStreamReader(inputStream, "UTF-8"));
StringBuilder sb = new StringBuilder();
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
inputStream.close();
result = sb.toString();
Log.e("pass 2", "connection success ");
} catch (Exception e) {
Log.e("Fail 2", e.toString());
}
try {
JSONObject json_data = new JSONObject(result);
i_id = (json_data.getString("imovel_id"));
Log.e("pass 1", "id do imovel = " + i_id);
} catch (Exception e) {
Log.e("Fail 3", e.toString());
}
Integer result = 0;
try {
// Create Apache HttpClient
HttpClient httpclient = new DefaultHttpClient();
HttpResponse httpResponse = httpclient.execute(new HttpGet(params[0]));
int statusCode = httpResponse.getStatusLine().getStatusCode();
// 200 represents HTTP OK
if (statusCode == 200) {
String response = streamToString(httpResponse.getEntity().getContent());
parseResult(response);
result = 1; // Successful
} else {
result = 0; //"Failed
}
} catch (Exception e) {
Log.d(TAG, e.getLocalizedMessage());
}
return result;
}
@Override
protected void onPostExecute(Integer result) {
// Download complete. Lets update UI
if (result == 1) {
mGridAdapter.setGridData(mGridData);
} else {
Toast.makeText(GridViewActivity.this, "Failed to fetch data!", Toast.LENGTH_SHORT).show();
}
//Hide progressbar
mProgressBar.setVisibility(View.GONE);
}
}
String streamToString(InputStream stream) throws IOException {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream));
String line;
String result = "";
while ((line = bufferedReader.readLine()) != null) {
result += line;
}
// Close stream
if (null != stream) {
stream.close();
}
return result;
}
/**
* Parsing the feed results and get the list
*
* @param result
*/
private void parseResult(String result) {
try {
JSONObject response = new JSONObject(result);
JSONArray posts = response.optJSONArray("posts");
GridItem item;
for (int i = 0; i < posts.length(); i++) {
JSONObject post = posts.optJSONObject(i);
item = new GridItem();
item.setImage(post.getString("images"));
mGridData.add(item);
}
} catch (JSONException e) {
e.printStackTrace();
}
}
答案 0 :(得分:2)
首先,看起来你读了一个空行。您可能希望跳过文件pd.read_csv(filename, skiprows=1)
中的第一行。
您应该使用pd.to_datetime()
将时间的文本表示转换为DatetimeIndex。
df.set_index(pd.to_datetime(df['time']), inplace=True)
然后您应该可以重新取样。
df.resample('15min', how=np.mean)
答案 1 :(得分:0)
df = pd.read_csv('myfile.csv', parse_dates=True)
如果格式合理,您的日期列应该具有日期时间类型。然后你可以设置索引和重新采样,如上所述。