我正在尝试对分钟数据进行缩采样,而我的索引是日期时间。但是当我调用pandas.resample时,它仅返回一列,而我的数据包含六列
import pandas as pd
from matplotlib import pyplot
dataset = pd.read_csv('household_power_consumption.txt', sep=';', header=0,
low_memory=False, infer_datetime_format=True, parse_dates={'datetime':
[0,1]}, index_col=['datetime']) #Date and time has been combined
dataset.head();
dataset=dataset.resample('H', how='mean', label='left');
a=dataset.head();
print(a)
dataset.to_csv('Downsampled_House_data.csv');
dataset.resample
仅返回一列。
答案 0 :(得分:1)
如果数据文件来自link,则问题在于?
缺少一些值。
必要的参数na_values='?'
。
dataset = pd.read_csv('household_power_consumption.txt',
sep=';',
header=0,
low_memory=False,
infer_datetime_format=True,
parse_dates={'datetime': [0,1]}, #Date and time has been combined
index_col=['datetime'],
na_values='?')
print(dataset.head())
Global_active_power Global_reactive_power Voltage \
datetime
2006-12-16 17:24:00 4.216 0.418 234.84
2006-12-16 17:25:00 5.360 0.436 233.63
2006-12-16 17:26:00 5.374 0.498 233.29
2006-12-16 17:27:00 5.388 0.502 233.74
2006-12-16 17:28:00 3.666 0.528 235.68
Global_intensity Sub_metering_1 Sub_metering_2 \
datetime
2006-12-16 17:24:00 18.4 0.0 1.0
2006-12-16 17:25:00 23.0 0.0 1.0
2006-12-16 17:26:00 23.0 0.0 2.0
2006-12-16 17:27:00 23.0 0.0 1.0
2006-12-16 17:28:00 15.8 0.0 1.0
Sub_metering_3
datetime
2006-12-16 17:24:00 17.0
2006-12-16 17:25:00 16.0
2006-12-16 17:26:00 17.0
2006-12-16 17:27:00 17.0
2006-12-16 17:28:00 17.0
print (dataset.info())
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2075259 entries, 2006-12-16 17:24:00 to 2010-11-26 21:02:00
Data columns (total 7 columns):
Global_active_power float64
Global_reactive_power float64
Voltage float64
Global_intensity float64
Sub_metering_1 float64
Sub_metering_2 float64
Sub_metering_3 float64
dtypes: float64(7)
memory usage: 126.7 MB
None
dataset=dataset.resample('H', label='left').mean()
print(dataset.head())
Global_active_power Global_reactive_power Voltage \
datetime
2006-12-16 17:00:00 4.222889 0.229000 234.643889
2006-12-16 18:00:00 3.632200 0.080033 234.580167
2006-12-16 19:00:00 3.400233 0.085233 233.232500
2006-12-16 20:00:00 3.268567 0.075100 234.071500
2006-12-16 21:00:00 3.056467 0.076667 237.158667
Global_intensity Sub_metering_1 Sub_metering_2 \
datetime
2006-12-16 17:00:00 18.100000 0.0 0.527778
2006-12-16 18:00:00 15.600000 0.0 6.716667
2006-12-16 19:00:00 14.503333 0.0 1.433333
2006-12-16 20:00:00 13.916667 0.0 0.000000
2006-12-16 21:00:00 13.046667 0.0 0.416667
Sub_metering_3
datetime
2006-12-16 17:00:00 16.861111
2006-12-16 18:00:00 16.866667
2006-12-16 19:00:00 16.683333
2006-12-16 20:00:00 16.783333
2006-12-16 21:00:00 17.216667