Question

我有一个由金融股票ID [0,1400]和时间戳[0,1800]组成的数据集。对于给定的ID，它要么具有或者没有给定时间戳的数据。

我创建了一个字典，其中每个键都是一个ID，每个值都是该ID包含数据的所有时间戳的列表。

我现在想绘制一个图表，每行对应一个ID，每列对应一个时间戳。如果ID [i, j]包含时间戳i（j）的数据，则图表的每个单元格if j in dict[i]将显示为绿色，如果没有，则为红色。

以下是我在Excel中手动生成的示例：

可以通过matplotlip或其他一些库来完成吗？

由于图表的大小为1400x1800，因此单元格可能非常小。我正在尝试对数据进行重新排序，以便最大化相邻ID之间相交的绿色单元格的数量，因此该图表将允许我提供我在数据集中实现这些重叠/交叉的程度的可视化。

为了提供一些数据，我只是遍历字典中的前20个ID并打印出ID及其时间戳列表。 each line is in the form of ID [list of IDs timestamps]

修改

这是我第一次尝试小规模的数据示例。虽然这确实实现了我的目标，但这是一个非常强力的解决方案，因此任何有关改进的建议都将受到赞赏。

import matplotlib.pyplot as plt
import pandas as pd

TSs = [0, 1, 2, 3, 4, 5]
ID_TS = {0: [1, 2, 3], 1: [2, 3, 4, 5]}

df = pd.DataFrame(index=ID_TS.keys(), columns=TSs)

for ID, TS in ID_TS.items():
    bools = []
    for i in TSs:
        if i in TS:
            bools.append(True)
        else:
            bools.append(False)
    df.loc[ID] = bools

plt.imshow(df, cmap='hot', interpolation='nearest')
plt.show()

Answer 1

生成数据帧的代码不起作用。所以我对此采取了一些自由......

import numpy
import pandas
from matplotlib import pyplot
from matplotlib import ticker

TSs = [0, 1, 2, 3, 4, 5]
ID_TS = {0: [1, 2, 3, numpy.nan], 1: [2, 3, 4, 5]}

fig, ax = pyplot.subplots()

img = ( 
    pandas.DataFrame(data=ID_TS, columns=TSs)
        .isnull()
        .pipe(numpy.bitwise_not)
        .pipe(ax.pcolor, cmap='RdYlGn', edgecolors='k')
)

unit_ints = ticker.MultipleLocator(1)

ax.set_xlabel('Time')
ax.set_ylabel('ID')
ax.yaxis.set_major_locator(unit_ints)
ax.xaxis.set_major_locator(unit_ints)

绘制图表，显示一组列表中存在范围内的哪些值

1 个答案: