Question

我有一个非常尴尬的数据框，如下所示：

+----+------+-------+-------+--------+----+--------+
|    |      | hour1 | hour2 | hour 3 | …  | hour24 |
+----+------+-------+-------+--------+----+--------+
| id | date |       |       |        |    |        |
| 1  | 3    |     4 |     0 |     96 | 88 |     35 |
|    | 4    |    10 |     2 |     54 | 42 |     37 |
|    | 5    |     9 |    32 |      8 | 70 |     34 |
|    | 6    |    36 |    89 |     69 | 46 |     78 |
| 2  | 5    |    17 |    41 |     48 | 45 |     71 |
|    | 6    |    50 |    66 |     82 | 72 |     59 |
|    | 7    |    14 |    24 |     55 | 20 |     89 |
|    | 8    |    76 |    36 |     13 | 14 |     21 |
| 3  | 5    |    97 |    19 |     41 | 61 |     72 |
|    | 6    |    22 |     4 |     56 | 82 |     15 |
|    | 7    |    17 |    57 |     30 | 63 |     88 |
|    | 8    |    83 |    43 |     35 |  8 |      4 |
+----+------+-------+-------+--------+----+--------+

对于每个id，都有一个dates列表，对于每个date，小时列表示整整24小时按小时分解的整天数据。

我想做的是绘制（使用matplotlib）每个ids的完整每小时数据，但我想不出这样做的方法。我正在研究创建numpy矩阵的可能性，但我不确定这是否是正确的道路。

澄清：基本上，对于每个id，我想按顺序将所有小时数据连接在一起并绘制出来。我已经按照正确的顺序度过了这些日子，所以我想只想找到一种方法将每个id的所有每小时数据放入一个对象

有关如何最好地完成此任务的任何想法？

以下是csv格式的一些示例数据：http://www.sharecsv.com/s/e56364930ddb3d04dec6994904b05cc6/test1.csv

Answer 1

这是一种方法：

for groupID, data in d.groupby(level='id'):
    fig = pyplot.figure()
    ax = fig.gca()
    ax.plot(data.values.ravel())
    ax.set_xticks(np.arange(len(data))*24)
    ax.set_xticklabels(data.index.get_level_values('date'))

ravel是一个numpy方法，它将多行排成一个长1D数组。

请注意在大型数据集上以交互方式运行此操作，因为它会为每一行创建单独的绘图。如果要保存图表等，请设置非交互式matplotlib后端并使用savefig保存每个图形，然后在创建下一个图形之前将其关闭。

Answer 2

堆叠数据框可能也很有意义，以便在同一索引中将日期和时间放在一起。例如，做

df = df.stack().unstack(0)

将日期和时间放在索引中，将id作为列名称。调用df.plot()将为您提供相同轴上每个时间序列的线图。所以你可以这样做

ax = df.stack().unstack(0).plot()

并通过将参数传递给plot方法或通过调用ax上的方法来格式化轴。

Answer 3

我对这个解决方案并不完全满意，但也许它可以作为起点。由于您的数据是循环的，我选择了极坐标图。不幸的是，y方向的分辨率很差。因此，我手动缩放到情节中：

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

df = pd.read_csv('test1.csv')
df_new = df.set_index(['id','date'])
n = len(df_new.columns)

# convert from hours to rad
angle = np.linspace(0,2*np.pi,n)


# color palete to cycle through
n_data = len(df_new.T.columns)
color = plt.cm.Paired(np.linspace(0,1,n_data/2)) # divided by two since you have 'red', and 'blue'
from itertools import cycle
c_iter = cycle(color)

fig = plt.figure()
ax = fig.add_subplot(111, polar=True)

# looping through the columns and manually select one category
for ind, i in enumerate(df_new.T.columns):
    if i[0] == 'red':
        ax.plot(angle,df_new.T[i].values,color=c_iter.next(),label=i,linewidth=2)


# set the labels
ax.set_xticks(np.linspace(0, 2*np.pi, 24, endpoint=False))
ax.set_xticklabels(range(24))

# make the legend
ax.legend(loc='upper left', bbox_to_anchor = (1.2,1.1))
plt.show()

放大0：

enter image description here

放大1： enter image description here

放大2： enter image description here

绘制一个尴尬的熊猫多索引数据帧

3 个答案: