我有一个数据框,其中有许多列代表证券,而索引的时间是从00:00到23:55(每行是5分钟的间隔),每个单元格都有1或0我想绘制某种形式的箱形图,它可以将数据可视化,就像我在这里绘制的那样:
答案 0 :(得分:1)
一种方法可能是使用我上面评论过的链接,尽管您的初始数据集不同。该过程包括为每列分配数值并按NaN更改零,如下所示:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("testdata.txt",parse_dates=0,index_col=0)
df = df.applymap(lambda x:x if x else pd.np.nan)
for n, col in enumerate(df.columns): df[col] = df[col]*n
df.plot(lw=10,legend=False)
plt.yticks(pd.np.arange(len(df.columns)), df.columns)
plt.tight_layout()
plt.show()
结果数据框是:
A B C D E
time
2016-05-05 00:00:00 0 NaN NaN NaN 4
2016-05-05 00:05:00 0 NaN NaN 3.0 4
2016-05-05 00:10:00 0 NaN NaN 3.0 4
2016-05-05 00:15:00 0 NaN NaN 3.0 4
和情节:
问候。
答案 1 :(得分:0)
我认为你可以使用的是matplotlib中破碎的条形图。文档为here。
这是我测试的简单版本: 不幸的是,我无法找到一种通过公平来对业务进行矢量化的方法。
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame()
df['qqq'] = [1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,0,0,0]
df['dia'] = [0,0,1,1,0,1,0,1,1,1,0,0,0,1,1,0,0,1,1,1,1,1,1,1,1]
ones = []
for col in df.columns:
one = df[df[col].diff() != 0][:][col]
one = one[one == 1]
ones.append(one)
hranges = []
for col in df.columns:
diff = df[df[col].diff() != 0]
spread = pd.DataFrame(diff[col].index, columns=[col])
spread = spread.set_value(len(spread), col, len(df[col].index))
spread = spread.diff(periods=-1).fillna(spread[pd.isnull(spread.diff()) == True])*-1
spread = spread.drop(spread.index[-1])
re_index = pd.DataFrame(df[df[col].diff() != 0][:][col].tolist())
re_index = re_index[re_index[0] == 0]
hranges.append(spread.drop(re_index[re_index[0] == 0].index))
hranges[j].columns = ['width']
hranges[j]['hval'] = ones[j].index.tolist()
cols = hranges[j].columns
cols = cols[-1:] | cols [:-1]
hranges[j] = hranges[j][cols]
j += 1
vals = []
for j in range(len(hranges)):
val = [(hranges[j].hval[i], hranges[j].width[i]) for i in hranges[j].index]
vals.append(val)
fig, ax = plt.subplots()
j = 0
for col in df.columns:
ax.broken_barh(vals[j], ((j+1)*10,10))
j += 1
ax.set_yticks([((k+1) * 10) + 5 for k in range(j)])
ax.set_yticklabels(df.columns)
plt.show()
结果如下:
显然你的例子会有x轴的时间值,但我想你可以搞清楚。