Question

我在Pandas中有一个数据框，其中的行是在不同时间的观测值，每列是一个大小仓，其中值表示对该大小仓观察到的粒子数量。因此，它看起来如下所示：

         bin1    bin2    bin3    bin4    bin5
Time1    50      200     30      40      5

Time2    60      60      40      420     700

Time3    34      200     30      67      43

我想使用plotly /袖扣创建一个散点图，其中x轴将是每个尺寸容器，y轴将是每个尺寸容器中的值。将有三种颜色，每种观察结果一种。

随着我在Matlab方面的经验越来越丰富，我尝试使用iloc索引值（请注意，下面的示例只是试图绘制一个观察值）：

df.iplot(kind="scatter",theme="white",x=df.columns, y=df.iloc[1,:])

但是我只是收到一个关键错误：0条消息。

在熊猫中选择x和y值时是否可以使用索引？

Answer 1

除了索引之外，我认为您需要更好地了解pandas和matplotlib的相互作用。

让我们逐步解决您的情况：

正如pandas.DataFrame.plot文档所说，绘制的序列是一列。该行中有系列，因此需要转置数据框。
要创建散点图，需要在不同的列中同时使用x和y坐标，但是缺少x列，因此还需要创建一个在转置数据框中使用x值的列。
显然，pandas不会在连续调用plot的情况下默认更改颜色（matplotlib是），因此您需要选择一个颜色图并传递一个color参数，否则所有点将具有相同的颜色。

这是一个工作示例：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Here I copied you data in a data.txt text file and import it in pandas as a csv.
#You may have a different way to get your data.
df = pd.read_csv('data.txt', sep='\s+', engine='python')

#I assume to have a column named 'time' which is set as the index, as you show in your post.
df.set_index('time')

tdf = df.transpose() #transpose the dataframe

#Drop the time column from the trasponsed dataframe. time is not a data to be plotted.
tdf = tdf.drop('time')

#Creating x values, I go for 1 to 5 but they can be different.
tdf['xval'] = np.arange(1, len(tdf)+1)

#Choose a colormap and making a list of colors to be used.
colormap = plt.cm.rainbow
colors = [colormap(i) for i in np.linspace(0, 1, len(tdf))]

#Make an empty plot, the columns will be added to the axes in the loop.
fig, axes = plt.subplots(1, 1)
for i, cl in enumerate([datacol for datacol in tdf.columns if datacol != 'xval']):
    tdf.plot(x='xval', y=cl, kind="scatter", ax=axes, color=colors[i])

plt.show()

这将绘制以下图像：

Here有关在matplotlib中选择颜色的教程。

如何创建散点图，其中值跨越多个列？

1 个答案: