Question

我有一个有关eBay二手车的数据集，在按如下方式编辑数据集后，我试图对其进行绘制：

import pandas as pd

df = pd.read_csv("./autos.csv.bz2", encoding = "iso8859-1")
df = df.drop(["dateCrawled", "abtest", "dateCreated", "nrOfPictures", "lastSeen", "postalCode", "seller", "offerType"], axis = 1)

import numpy as np

df["monthOfRegistration"] = np.where(df["monthOfRegistration"] == 0, 6, df["monthOfRegistration"])


df["registration"] = df["yearOfRegistration"] + (df["monthOfRegistration"] - 1) / 12

df = df.drop(["yearOfRegistration", "monthOfRegistration"], axis = 1)


df = df.drop(df[df["price"] == 0].index)
df = df.drop(df[df["powerPS"] == 0].index)


print(df["notRepairedDamage"].unique())
print(df["notRepairedDamage"])

df["notRepairedDamage"] = np.where(df["notRepairedDamage"] == "ja", 1, df["notRepairedDamage"])
df["notRepairedDamage"] = np.where(df["notRepairedDamage"] == "nein", 0, df["notRepairedDamage"])


df = df[df["notRepairedDamage"].notnull()]

我尝试使用matplotlib使用seaborn.pairplot绘制数据，但出现以下错误：

ValueError：颜色kwarg每个数据集必须具有一种颜色

我只获得了前3条线的相对频率的图，其他所有图都是空的，第4条和第5条线的相对频率也是如此。

Matplotlib seaborn, example image

df = df[(df["price"] < 100000) & (df["powerPS"] < 2000)

from IPython import get_ipython
get_ipython().run_line_magic('matplotlib', 'inline')

import seaborn as sns

g = sns.pairplot(df)

我认为编辑数据集时确实出错了。有谁可以帮助我吗？那太好了！非常感谢你！

Answer 1

在您发表评论后，请提供示例片段，希望对您有所帮助。也许问题出在IPython？不幸的是，我不知道。拥有数据集肯定会有所帮助。

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

a = pd.DataFrame()
a['One'] = [1, 3, 3, 2, 1]
a['Two'] = ['ja', 'ja', 'nein', 'ja', 'nein']
a['Two'] = np.where(a['Two'] == 'ja', 1, a['Two'])
a['Two'] = np.where(a['Two'] == 'nein', 0, a['Two'])
a = a[a['Two'].notnull()]
print(a)
sns.pairplot(a)
plt.show()

此打印

并显示

Answer 2

问题出在以下事实：pairplot仅接受PairGrid中的某些熊猫类型：float或int，而不接受Object或{{ 1}}（例如（至少对于某些版本的matplotlib和/或seaborn：3.0.3和0.9.0会分别产生该错误）。

在绘制以下示例之前，使用Int64修改相关系列可以解决此问题，因为.astype('float')设置为a.One，而Int64最初是{{1} }类型：

a.Two

请注意，如果数据框中有一些Object，则a = pd.DataFrame() a['One'] = [1, 3, 3, 2, 1] a['One']=a['One'].astype('Int64') a['Two'] = ['yes', 'yes', 'no', 'yes', 'no'] a['Two'] = np.where(a['Two'] == 'yes', 1, a['Two']) a['Two'] = np.where(a['Two'] == 'no', 0, a['Two']) a['One']=a['One'].astype('int') a['Two']=a['Two'].astype('int') sns.pairplot(a) plt.show()是唯一的选择，因为NaN会接受缺失值，但不能接受float类型。

ValueError：“颜色kwarg每个数据集必须具有一种颜色” matplotlib seaborn

2 个答案: