Question

我有一些问题，将pandas数据框中的第二列绘制到双x轴上。我想这可能是因为第二个有问题的列包含NaN值。 NaN值存在，因为每10年只有数据可用，尽管第一列每年都有数据。它们是使用np.nan生成的，为了清楚起见，我将其包含在最后。

这里的直觉是将两个系列绘制在同一个x轴上，以显示它们随时间变化的趋势。

这是我的代码和数据框：

import pandas as pd
import numpy as np
import matplotlib as plt
import matplotlib.pyplot as plt

list1 = ['1297606', '1300760', '1303980', '1268987', '1333521', '1328570', 
         '1328112', '1353671', '1371285', '1396658', '1429247', '1388937', 
         '1359145', '1330414', '1267415', '1210883', '1221585', '1186039', 
         '884273', '861789', '857475', '853485', '854122', '848163', '839226', 
         '820151', '852385', '827609', '825564', '789217', '765651']

list1a = [1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 
          1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
          2004, 2005, 2006, 2007, 2008, 2009, 2010]

list3b = [121800016.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          145279588.0, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          160515434.5, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 
          168140487.0]

d = {'Year': list1a,'Abortions per Year': list1, 
     'Affiliation with Religious Institutions': list3b}
newdf = pd.DataFrame(data=d)

newdf.set_index('Year',inplace=True)

fig, ax1 = plt.subplots(figsize=(20,5))

y2min = min(newdf['Affiliation with Religious Institutions'])
y2max = max(newdf['Affiliation with Religious Institutions'])
ax1.plot(newdf['Abortions per Year'])
#ax1.set_xticks(newdf.index)
ax1b = ax1.twinx()
ax1b.set_ylim(y2min*0.8,y2max*1.2)
ax1b.plot(newdf['Affiliation with Religious Institutions'])
plt.show()

我最终得到的图表没有显示第二个情节。（当我将第二个图改为每年的数值时，它会绘制它）。这是第二个图（带有NaN值） - 被忽略：

感谢任何建议。

*如何为第二列生成np.nan值：我通过索引列循环，并且每年没有数据，将np.nan返回到列表，然后将其作为列。

for i in range(len(list1a)):
    if list1a[i] in list3a:
        var = list2[j]
        list3b.append(var)

        j+=1
    else:
        var = np.nan
        list3b.append(var)

Answer 1

两件事。您需要将Abortions per Year列转换为数字类型以进行绘图，至少对于您提供的str格式的数据;第二，您可以在绘图前删除Affiliation with Religious Institutions值，将nan绘制为一条线。

ax1.plot(newdf['Abortions per Year'].astype(int))

...

ax1b.plot(newdf['Affiliation with Religious Institutions'].dropna())

Answer 2

对于您正在执行的大多数操作，您可以使用pandas DataFrame方法。这两行将解决您的所有问题：

newdf = newdf.astype(float)
newdf = newdf.interpolate(method='linear')

因此，您的绘图代码将如下所示：

fig, ax1 = plt.subplots(figsize=(20,5))

newdf = newdf.astype(float)
newdf = newdf.interpolate(method='linear')
y2min = newdf['Affiliation with Religious Institutions'].min()
y2max = newdf['Affiliation with Religious Institutions'].max()
newdf['Abortions per Year'].plot.line(ax=ax1)
#ax1.set_xticks(newdf.index)
ax1b = ax1.twinx()
ax1b.set_ylim(y2min*0.8,y2max*1.2)
newdf['Affiliation with Religious Institutions'].plot.line(ax=ax1b)
plt.show()

使用pandas方法绘制DataFrame只是一个建议。但您也可以使用matplotlib代码，因为pandas使用matplotlib作为绘图后端

我添加的两行执行以下操作：
您的专栏Abortions per Year属于dtype object。您需要将其转换为数字类型：

newdf = newdf.astype(float)

实际上NaN - 值不会被忽略，但由于它们是单个值而未显示。因此，您可以在第二个图中添加marker。如果要为第二个图显示一条线，则需要使用以下内容插值：

newdf = newdf.interpolate(method='linear')

如果插值完成，可以删除标记。

Answer 3

我现在明白了。要使用现有代码实现这一目标，您只需使用Pandas forwardfill。

在

之后

newdf.set_index('Year',inplace=True)

刚刚放

newdf.fillna(method='ffill', inplace=True)

Answer 4

这里出现的一个基本问题是你正在绘制一个点作为一条线。

list3b = [121800016.0，nan，nan ....... 从一点到无。

如果您将第二个nan更改为一个值： list3b = [121800016.0,121800016.0，nan，..... 然后你会看到一个结果。

也许您应该将这些值绘制为条形或散点。

绘制包含NaN值的pandas数据帧列

4 个答案: