Question

我使用以下代码：

import matplotlib.pyplot as pyplot
import pandas as pandas
from datetime import datetime

dataset = pandas.read_csv("HugLog_17.01.11.csv", sep=",", header=0)

print('filter data for SrcAddr')
dataset_filtered = dataset[dataset['SrcAddr']=='0x1FD3']

print('get Values')
varY = dataset_filtered.Battery_Millivolt.values

varX = dataset_filtered.Timestamp.values

print('Convert the date-strings in date-objects.')
dates_list = [datetime.strptime(date, '%y-%m-%d %H:%M:%S') for date in varX]

fig = pyplot.figure()
ax1 = fig.add_subplot(1,1,1)
ax1.set_xlabel('Time')
ax1.set_ylabel('Millivolt')
ax1.bar(dates_list, varY)

pyplot.locator_params(axis='x',nbins=10)

pyplot.show()

我遇到的问题是，它是一个具有180k数据点的大型数据收集。

并且pyplot显示所有点，图表使其变慢并且条重叠。有没有办法设置在“视图”中显示的数据点数的最大限制。

我的意思是，一旦图形渲染，只有50个数据点，当我放大时，我只能再获得最多50个数据点。

Answer 1

可以使用pandas中的resample函数完成重采样。

请注意，{1}}语法在0.17和0.19之间的pandas中发生了变化。以下示例使用旧样式。参见例如新风格this tutorial。

resample

在缩放时自动调整重采样确实需要一些手动工作。 matplotlib事件处理页面上有一个resampling example，它不能开箱即用，但可以相应调整。

这就是它的样子：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# generate some data for every second over a whole day
times = pd.date_range(start='2017-01-11',periods=86400, freq='1S')
df = pd.DataFrame(index = times)
df['data'] = np.sort(np.random.randint(low=1300, high=1600, size=len(df.index)) )[::-1] + \
             np.random.rand(len(df.index))*100

# resample the data, taking the mean over 1 hours ("H")
t = "H" # for hours, try "T" for minutes as well
width=1./24 #matplotlib default uses a width of 1 day per bar
                 # try width=1./(24*60) for minutes
df_resampled = pd.DataFrame()
df_resampled['data'] = df.data.resample(t, how="mean")

fig, ax = plt.subplots()

#ax.bar(df.index, df['data'], width=1./(24*60*60)) # original data, takes too long to plot
ax.bar(df_resampled.index, df_resampled['data'], width=width)
ax.xaxis_date()

plt.show()

Answer 2

您可以做的一件事是在您的pandas DataFrame上使用sample方法绘制数据的随机子集。使用frac参数确定要使用的点的分数。范围从0到1.

获得dataset_filtered数据框后，请按照这样的方式对其进行采样

dataset_filtered_sample = dataset_filtered.sample(frac=.001)

设置每个绘图的最大数据点

2 个答案: