关注the plotly directions,我想绘制类似于以下代码的内容:
import plotly.plotly as py
import plotly.figure_factory as ff
import numpy as np
# Add histogram data
x1 = np.random.randn(200) - 2
x2 = np.random.randn(200)
x3 = np.random.randn(200) + 2
x4 = np.random.randn(200) + 4
# Group data together
hist_data = [x1, x2, x3, x4]
group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']
# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size = [.1, .25, .5, 1])
# Plot!
py.iplot(fig, filename = 'Distplot with Multiple Bin Sizes')
但是,我有一个样本大小不均匀的真实世界数据集(即组1的计数与组2中的计数不同,等等)。此外,它是名称 - 值对格式。
这是一些虚拟数据来说明:
# Add histogram data
x1 = pd.DataFrame(np.random.randn(100))
x1['name'] = 'x1'
x2 = pd.DataFrame(np.random.randn(200) + 1)
x2['name'] = 'x2'
x3 = pd.DataFrame(np.random.randn(300) - 1)
x3['name'] = 'x3'
df = pd.concat([x1, x2, x3])
df = df.reset_index(drop = True)
df.columns = ['value', 'names']
df
如您所见,每个名称(x1,x2,x3)都有不同的计数,而“名称”列也是我想用作颜色的。
有谁知道如何在情节上策划这个?
在R中的FYI,它非常简单,我只需要调用ggplot,并在aes(fill = names)
中。
任何帮助将不胜感激,谢谢!
答案 0 :(得分:2)
您可以尝试对数据帧进行切片,然后将其放入Ploty中。
fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1])
import plotly
import pandas as pd
plotly.offline.init_notebook_mode()
x1 = pd.DataFrame(np.random.randn(100))
x1['name']='x1'
x2 = pd.DataFrame(np.random.randn(200)+1)
x2['name']='x2'
x3 = pd.DataFrame(np.random.randn(300)-1)
x3['name']='x3'
df=pd.concat([x1,x2,x3])
df=df.reset_index(drop=True)
df.columns = ['value','names']
fig = ff.create_distplot([df[df.names == a].value for a in df.names.unique()], df.names.unique(), bin_size=[.1, .25, .5, 1])
plotly.offline.iplot(fig, filename='Distplot with Multiple Bin Sizes')
答案 1 :(得分:1)
example的plotly
文档中的开箱即用,样本尺寸也不均匀:
#!/usr/bin/env python
import plotly
import plotly.figure_factory as ff
plotly.offline.init_notebook_mode()
import numpy as np
# data with different sizes
x1 = np.random.randn(300)-2
x2 = np.random.randn(200)
x3 = np.random.randn(4000)+2
x4 = np.random.randn(50)+4
# Group data together
hist_data = [x1, x2, x3, x4]
# use custom names
group_labels = ['x1', 'x2', 'x3', 'x4']
# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2)
# change that if you don't want to plot offline
plotly.offline.plot(fig, filename='Distplot with Multiple Datasets')
上述脚本将产生以下结果:
{{3}}