我有一组数据,我想将其绘制为线图。对于每个系列,缺少一些数据(但每个系列都不同)。目前,matplotlib不会绘制跳过缺失数据的行:例如
import matplotlib.pyplot as plt
xs = range(8)
series1 = [1, 3, 3, None, None, 5, 8, 9]
series2 = [2, None, 5, None, 4, None, 3, 2]
plt.plot(xs, series1, linestyle='-', marker='o')
plt.plot(xs, series2, linestyle='-', marker='o')
plt.show()
导致线条中有间隙的图。如何告诉matplotlib在间隙中绘制线条? (我宁愿不必插入数据)。
答案 0 :(得分:68)
您可以通过以下方式屏蔽NaN值:
import numpy as np
import matplotlib.pyplot as plt
xs = np.arange(8)
series1 = np.array([1, 3, 3, None, None, 5, 8, 9]).astype(np.double)
s1mask = np.isfinite(series1)
series2 = np.array([2, None, 5, None, 4, None, 3, 2]).astype(np.double)
s2mask = np.isfinite(series2)
plt.plot(xs[s1mask], series1[s1mask], linestyle='-', marker='o')
plt.plot(xs[s2mask], series2[s2mask], linestyle='-', marker='o')
plt.show()
这导致
答案 1 :(得分:5)
Qouting @Rutger Kassies(link):
Matplotlib仅在连续(有效)数据点之间绘制一条线, 并且在NaN值上留下间隙。
如果您使用 Pandas ,
,则为解决方案#pd.Series
s.dropna().plot() #masking (as @Thorsten Kranz suggestion)
#pd.DataFrame
df['a_col_ffill'] = df['a_col'].ffill(method='ffill')
df['b_col_ffill'] = df['b_col'].ffill(method='ffill') # changed from a to b
df[['a_col_ffill','b_col_ffill']].plot()
答案 2 :(得分:2)
如果没有插值,您需要从数据中删除无。这也意味着您需要删除系列中与“无”相对应的X值。这是一个(丑陋的)单行代码:
x1Clean,series1Clean = zip(* filter( lambda x: x[1] is not None , zip(xs,series1) ))
对于None值,lambda函数返回False,从列表中过滤x,系列对,然后将数据重新压缩回原始形式。
答案 3 :(得分:2)
使用熊猫的解决方案:
import matplotlib.pyplot as plt
import pandas as pd
def splitSerToArr(ser):
return [ser.index, ser.as_matrix()]
xs = range(8)
series1 = [1, 3, 3, None, None, 5, 8, 9]
series2 = [2, None, 5, None, 4, None, 3, 2]
s1 = pd.Series(series1, index=xs)
s2 = pd.Series(series2, index=xs)
plt.plot( *splitSerToArr(s1.dropna()), linestyle='-', marker='o')
plt.plot( *splitSerToArr(s2.dropna()), linestyle='-', marker='o')
plt.show()
答案 4 :(得分:1)
对于它可能值得的东西,经过一些试验和错误后,我想对Thorsten的解决方案添加一个澄清。希望为尝试这种方法后在其他地方寻找的用户节省时间。
使用
时,我无法获得相同问题的成功from pyplot import *
并尝试用
绘图plot(abscissa[mask],ordinate[mask])
似乎需要使用import matplotlib.pyplot as plt
来获得正确的NaN处理,但我不能说明原因。
答案 5 :(得分:0)
熊猫数据框的另一种解决方案:
plot = df.plot(style='o-') # draw the lines so they appears in the legend
colors = [line.get_color() for line in plot.lines] # get the colors of the markers
df = df.interpolate(limit_area='inside') # interpolate
lines = plot.plot(df.index, df.values) # add more lines (with a new set of colors)
for color, line in zip(colors, lines):
line.set_color(color) # overwrite the new lines colors with the same colors as the old lines
答案 6 :(得分:0)
我遇到了同样的问题,但是掩码消除了中间的点,并且线被以任何方式切割(我们在图片中看到的粉红色线是唯一连续的非 NaN 数据,这就是该线的原因) .以下是屏蔽数据的结果(仍有差距):
xs = df['time'].to_numpy()
series1 = np.array(df['zz'].to_numpy()).astype(np.double)
s1mask = np.isfinite(series1)
fplt.plot(xs[s1mask], series1[s1mask], ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ')
也许是因为我使用的是 finplot(绘制蜡烛图),所以我决定用线性公式 y2-y1=m(x2-x1) 制作缺少的 Y 轴点,然后制定生成的函数缺失点之间的 Y 值。
def fillYLine(y):
#Line Formula
fi=0
first = None
next = None
for i in range(0,len(y),1):
ne = not(isnan(y[i]))
next = y[i] if ne else next
if not(next is None):
if not(first is None):
m = (first-next)/(i-fi) #m = y1 - y2 / x1 - x2
cant_points = np.abs(i-fi)-1
if (cant_points)>0:
points = createLine(next,first,i,fi,cant_points)#Create the line with the values of the difference to generate the points x that we need
x = 1
for p in points:
y[fi+x] = p
x = x + 1
first = next
fi = i
next = None
return y
def createLine(y2,y1,x2,x1,cant_points):
m = (y2-y1)/(x2-x1) #Pendiente
points = []
x = x1 + 1#first point to assign
for i in range(0,cant_points,1):
y = ((m*(x2-x))-y2)*-1
points.append(y)
x = x + 1#The values of the line are numeric we don´t use the time to assign them, but we will do it at the same order
return points
然后我使用简单的调用函数来填补像 y = fillYLine(y)
之间的空白,我的 finplot 就像:
x = df['time'].to_numpy()
y = df['zz'].to_numpy()
y = fillYLine(y)
fplt.plot(x, y, ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ')
您需要认为 Y 变量中的数据仅用于绘图,我需要操作之间的 NaN 值(或将它们从列表中删除),这就是我从 Pandas 数据集创建 Y 变量的原因df['zz']
。
注意:我注意到在我的例子中数据被消除了,因为如果我不屏蔽 X (xs) 值在图表中向左滑动,在这种情况下,它们会变成连续的而不是 NaN 值,它会绘制连续的线,但是向左缩小:
fplt.plot(xs, series1[s1mask], ax=ax_candle, color='#FF00FF', width = 1, legend='ZZ') #No xs masking (xs[masking])
这让我觉得有些人使用蒙版的原因是因为他们只是在绘制那条线,或者非蒙版和蒙版数据之间没有太大区别(很少有差距,不像我的数据那样有很多)。
答案 7 :(得分:-1)
也许我错过了这一点,但我现在相信熊猫does this automatically。下面的例子有点涉及,需要互联网接入,但中国的线路在早期有很多空白,因此是直线段。
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# read data from Maddison project
url = 'http://www.ggdc.net/maddison/maddison-project/data/mpd_2013-01.xlsx'
mpd = pd.read_excel(url, skiprows=2, index_col=0, na_values=[' '])
mpd.columns = map(str.rstrip, mpd.columns)
# select countries
countries = ['England/GB/UK', 'USA', 'Japan', 'China', 'India', 'Argentina']
mpd = mpd[countries].dropna()
mpd = mpd.rename(columns={'England/GB/UK': 'UK'})
mpd = np.log(mpd)/np.log(2) # convert to log2
# plots
ax = mpd.plot(lw=2)
ax.set_title('GDP per person', fontsize=14, loc='left')
ax.set_ylabel('GDP Per Capita (1990 USD, log2 scale)')
ax.legend(loc='upper left', fontsize=10, handlelength=2, labelspacing=0.15)
fig = ax.get_figure()
fig.show()