比方说,我有以下数据框,每个月我都有一堆数据,存储在数组中以存储三个变量:
ID Y X1 X2 month
0 [2,4,6,8] [2,4,6,8] [2,4,6,8] 01
1 [Nan,4,6,8] [1,3,5,4] [4,3,3,3] 02
2 [3,4,5,6] [1,9,7,7] [2,2,6,Nan] 03
3 [1,2,3,4] [5,6,7,8] [9,9,Nan,6] 04
4 [2,4,6,8] [2,4,6,8] [2,4,6,8] 05
我最终想要做的是在01个月的Y和X1之间绘制一个散点图,标记为深蓝色,第二个月的标记为浅蓝色,依此类推。也许我也希望Y和X2的散点图在同一图中也用不同的红色阴影表示。
我尝试过这个:
df.iloc[0:1].plot.scatter(x = 'X1', y='Y')
但是得到的消息是没有数字对象要绘制...
Nan的价值观存在问题吗???
有任何想法吗?!非常感谢您的帮助!
答案 0 :(得分:1)
您需要更改数据框的结构:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {"ID":[0,1,2,3,4],
"Y":[np.array([2,4,6,8]),
np.array([np.nan,4,6,8]),
np.array([3,4,5,6]),
np.array([1,2,3,4]),
np.array([2,4,6,8])],
"X1":[np.array([2,4,6,8]),
np.array([1,2,5,4]),
np.array([1,9,7,7]),
np.array([5,6,7,8]),
np.array([2,4,6,8])],
"X2":[np.array([2,4,6,8]),
np.array([4,3,3,3]),
np.array([2,2,6,np.nan]),
np.array([9,9,np.nan,6]),
np.array([2,4,6,8])],
"month":[1,2,3,4,5]
}
df = pd.DataFrame(data)
check = 0
for v in range(len(df["Y"])):
val_y = df["Y"][v]
val_x1 = df["X1"][v]
val_x2 = df["X2"][v]
ID = df["ID"][v]
month = df["month"][v]
if check == 0:
helper_dat = {"ID":ID,"Y":list(val_y),"X1":list(val_x1),"X2":list(val_x2),"month":month}
new_df = pd.DataFrame(helper_dat)
else:
helper_dat = {"ID":ID,"Y":list(val_y),"X1":list(val_x1),"X2":list(val_x2),"month":month}
helper = pd.DataFrame(helper_dat)
new_df = new_df.append(helper,ignore_index=True)
check += 1
new_df现在看起来像这样:
ID Y X1 X2 month
0 0 2.0 2 2.0 1
1 0 4.0 4 4.0 1
2 0 6.0 6 6.0 1
3 0 8.0 8 8.0 1
4 1 NaN 1 4.0 2
5 1 4.0 2 3.0 2
6 1 6.0 5 3.0 2
7 1 8.0 4 3.0 2
8 2 3.0 1 2.0 3
9 2 4.0 9 2.0 3
10 2 5.0 7 6.0 3
11 2 6.0 7 NaN 3
12 3 1.0 5 9.0 4
13 3 2.0 6 9.0 4
14 3 3.0 7 NaN 4
15 3 4.0 8 6.0 4
16 4 2.0 2 2.0 5
17 4 4.0 4 4.0 5
18 4 6.0 6 6.0 5
19 4 8.0 8 8.0 5
现在很容易绘制值:
plt.scatter(new_df["X1"],new_df["Y"],c=new_df["month"], marker='^',label="X1")
plt.scatter(new_df["X2"],new_df["Y"],c=new_df["month"], marker='o',label="X2")
plt.legend()
编辑: 如果您只想绘制一个特定的月份:
plt.scatter(new_df[**new_df["month"]==4]["X1"]**,new_df[new_df["month"]==4]["Y"], marker='^',label="X1")
plt.scatter(new_df[new_df["month"]==4]["X2"],new_df[new_df["month"]==4]["Y"], marker='o',label="X2")
基于此Answer找到了一种方法:
sc = plt.scatter(new_df["X1"],new_df["Y"],c=new_df["month"], marker='^',label="X1")
plt.scatter(new_df["X2"],new_df["Y"],c=new_df["month"], marker='o',label="X2")
lp = lambda i: plt.plot([],color=sc.cmap(sc.norm(i)),
label="Month {:g}".format(i))[0]
handles = [lp(i) for i in np.unique(new_df["month"])]
plt.legend(handles=handles,bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()