Question

我有每小时数据，包含许多列。第一列是日期（date_log），其余列包含不同的样本点。问题是即使每小时使用不同的时间记录采样点，因此每列至少有几个NaN。如果我使用第一个代码进行绘图，它可以很好地工作，但是我希望在一天左右没有记录器数据的情况下存在间隙，并且不希望这些点被连接起来。如果我使用第二个代码，我可以看到间隙，但由于NaN点数据点没有加入。在下面的例子中，我只是绘制前三列。

如果存在像蓝点这样的巨大差距（01 / 06-01 / 07/2015），我希望有一个差距然后得分加入。第二个例子没有加入点。我喜欢第一个图表，但是我希望在24小时日期范围内没有样本数据点的情况下创建第二种方法的间隙等，将缺失的数据点作为间隙保留较长的时间。

有什么工作吗？感谢

1-方法：

Log_1a_mask = np.isfinite(Log_1a) # Log_1a is column 2 data points
Log_1b_mask = np.isfinite(Log_1b) # Log_1b is column 3 data points

plt.plot_date(date_log[Log_1a_mask], Log_1a[Log_1a_mask], linestyle='-', marker='',color='r',)
plt.plot_date(date_log[Log_1b_mask], Log_1b[Log_1b_mask], linestyle='-', marker='', color='b')
plt.show()

2-方法：

plt.plot_date(date_log, Log_1a, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.plot_date(date_log, Log_1b, ‘-r*’, markersize=2, markeredgewidth=0, color=’r’) # Log_1a contains raw data with NaN
plt.show()

1方法输出：

2方法输出：

Answer 1

如果我正确理解你，你就会得到一个数据集，其中包含许多小空白（单NaN s），你想要填补这些空白，并且你不会有更大的空白。

使用`pandas`至＆＃34;前进填充＆＃34;间隙

一种选择是使用pandas fillna并使用有限数量的填充值。

作为其工作原理的简单示例：

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.Series([1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4])

In [3]: x.fillna(method='ffill', limit=1)
Out[3]:
0     1
1     1
2     2
3     2
4   NaN
5     3
6     3
7   NaN
8   NaN
9     4
dtype: float64

In [4]: x.fillna(method='ffill', limit=2)
Out[4]:
0     1
1     1
2     2
3     2
4     2
5     3
6     3
7     3
8   NaN
9     4
dtype: float64

作为将此类用于与您的案例相似的示例：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Use pandas with a limited forward fill
# You may want to adjust the `limit` here. This will fill 2 nan gaps.
filled = pd.Series(x).fillna(limit=2, method='ffill')

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

使用`numpy`插入间隙

或者，我们只能使用numpy执行此操作。可以（并且更有效率）进行＆＃34;前向填充＆＃34;与上面的熊猫方法相同，但我将展示另一种方法，为您提供更多选项，而不仅仅是重复值。

我们可以对间隙中的值进行线性插值，而不是通过＆＃34;间隙＆＃34;重复最后一个值。这在计算上效率较低（并且我通过在任何地方进行插值来降低效率），但对于大多数数据集，您都不会注意到主要区别。

例如，让我们定义interpolate_gaps函数：

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

请注意，与之前的pandas版本不同，我们会获得内插值：

In [11]: values = [1, np.nan, 2, np.nan, np.nan, 3, np.nan, np.nan, np.nan, 4]

In [12]: interpolate_gaps(values, limit=1)
Out[12]:
array([ 1.        ,  1.5       ,  2.        ,         nan,  2.66666667,
        3.        ,         nan,         nan,  3.75      ,  4.        ])

在绘图示例中，如果我们替换该行：

filled = pd.Series(x).fillna(limit=2, method='ffill')

使用：

filled = interpolate_gaps(x, limit=2)

我们会得到一个视觉上相同的情节：

作为一个完整的，独立的例子：

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1977)

def interpolate_gaps(values, limit=None):
    """
    Fill gaps using linear interpolation, optionally only fill gaps up to a
    size of `limit`.
    """
    values = np.asarray(values)
    i = np.arange(values.size)
    valid = np.isfinite(values)
    filled = np.interp(i, i[valid], values[valid])

    if limit is not None:
        invalid = ~valid
        for n in range(1, limit+1):
            invalid[:-n] &= invalid[n:]
        filled[invalid] = np.nan

    return filled

x = np.random.normal(0, 1, 1000).cumsum()

# Set every third value to NaN
x[::3] = np.nan

# Set a few bigger gaps...
x[20:100], x[200:300], x[400:450] = np.nan, np.nan, np.nan

# Interpolate small gaps using numpy
filled = interpolate_gaps(x, limit=2)

# Let's plot the results
fig, axes = plt.subplots(nrows=2, sharex=True)
axes[0].plot(x, color='lightblue')
axes[1].plot(filled, color='lightblue')

axes[0].set(ylabel='Original Data')
axes[1].set(ylabel='Filled Data')

plt.show()

注意：我最初完全误读了这个问题。请参阅原始答案的版本历史记录。

Answer 2

我只是使用这个功能：

import math
for i in range(1,len(data)):
  if math.isnan(data[i]):
    data[i] = data[i-1]

在matplotlib中使用NaN值

2 个答案:

使用`pandas`至＆＃34;前进填充＆＃34;间隙

使用`numpy`插入间隙

在matplotlib中使用NaN值

2 个答案:

使用pandas至＆＃34;前进填充＆＃34;间隙

使用numpy插入间隙

使用`pandas`至＆＃34;前进填充＆＃34;间隙

使用`numpy`插入间隙