我有一个包含多个类别的数据集,我想在一个图中绘制图,看看有什么变化。我有一个数据集中给定类别的列表,我希望看到它们都在同一图中绘制
sample = [
['For business', 0.7616104043587437],
['For home and cottages', 0.6890139579274699],
['Consumer electronics', 0.039868871866136635],
['Personal things', 0.7487893699793786],
['Services', 0.747226678171249],
['Services', 0.23463661173977313],
['Animals', 0.6504301798258314],
['For home and cottages', 0.49567857024037665],
['For home and cottages', 0.9852681814098107],
['Transportation', 0.8134867587477912],
['Animals', 0.49988690699674654],
['Consumer electronics', 0.15086800344617235],
['For business', 0.9485494576819328],
['Hobbies and Leisure', 0.25766871111905243],
['For home and cottages', 0.31704508627659533],
['Animals', 0.6192114570078333],
['Personal things', 0.5755788287287359],
['Hobbies and Leisure', 0.10106922056341394],
['Animals', 0.16834618003738577],
['Consumer electronics', 0.7570803588496894]
]
train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability'])
parent_categories = train['parent_category_name'].unique()
parent_categories_size = len(parent_categories)
fig, ax = plt.subplots(figsize=(12,10))
colors = iter(cm.rainbow(np.linspace(0, 1, parent_categories_size)))
for parent_category_n in range(parent_categories_size):
parent_1 = train[train['parent_category_name'] == parent_categories[parent_category_name]]
ax.scatter(
range(parent_1.shape[0]),
np.sort(parent_1.deal_probability.values),
color = next(colors)
)
plt.ylabel('likelihood that an ad actually sold something', fontsize=12)
plt.title('Distribution of likelihood that an ad actually sold something')
我不知道为什么我只能看到最后一个情节而不是所有情节。或者,我可以在一个图中使用多个散点图,但我很难尝试绘制此图。
目前我正在处理10个类别,但我试图让它变得动态。
答案 0 :(得分:2)
如果您想观察一段时间内的发展情况,可以更好地看一下带有标记的线图,看看每个类别的变化:
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.cm as cm
sample = [ ['For business', 0.7616104043587437],
['For home and cottages', 0.6890139579274699],
['Consumer electronics', 0.039868871866136635],
['Personal things', 0.7487893699793786],
['Services', 0.747226678171249],
['Services', 0.23463661173977313],
['Animals', 0.6504301798258314],
['For home and cottages', 0.49567857024037665],
['For home and cottages', 0.9852681814098107],
['Transportation', 0.8134867587477912],
['Animals', 0.49988690699674654],
['Consumer electronics', 0.15086800344617235],
['For business', 0.9485494576819328],
['Hobbies and Leisure', 0.25766871111905243],
['For home and cottages', 0.31704508627659533],
['Animals', 0.6192114570078333],
['Personal things', 0.5755788287287359],
['Hobbies and Leisure', 0.10106922056341394],
['Animals', 0.16834618003738577],
['Consumer electronics', 0.7570803588496894] ]
train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability'])
parent_categories = train['parent_category_name'].unique()
fig, ax = plt.subplots(figsize=(10,8))
colors = iter(cm.rainbow(np.linspace(0, 1, len(parent_categories))))
for parent_category in parent_categories:
ax.plot(range(len(train[train["parent_category_name"] == parent_category])),
sorted(train[train["parent_category_name"] == parent_category].deal_probability.values),
color = next(colors),
marker = "o",
label = parent_category)
plt.ylabel('likelihood that an ad actually sold something', fontsize=12)
plt.title('Distribution of likelihood that an ad actually sold something')
plt.legend(loc = "best")
plt.show()
输出:
但由于这是一个任意比例,你对数据进行排序,在我看来,你甚至可以更好地看到分类图中的传播:
train = pd.DataFrame(data=sample, columns=['parent_category_name','deal_probability'])
parent_categories = train['parent_category_name'].unique()
fig, ax = plt.subplots(figsize=(18,9))
colors = iter(cm.rainbow(np.linspace(0, 1, len(parent_categories))))
for parent_category in parent_categories:
ax.scatter(
train[train["parent_category_name"] == parent_category].parent_category_name.values,
train[train["parent_category_name"] == parent_category].deal_probability.values,
color = next(colors),
label = parent_category
)
plt.ylabel('likelihood that an ad actually sold something', fontsize=12)
plt.title('Distribution of likelihood that an ad actually sold something')
plt.legend(loc = "best")
plt.show()
输出: