我有一些配对的连续数据,并希望将数据划分为“箱”或相同大小的类别;然后使用Python& amp;创建一个与附加图像相似的图。 MatplotLib。该图结合了平行线图,显示每个数据点的“前”和“后”值之间的差异,以及两个中心对齐的垂直直方图,以显示每组中数据的分布:
上面的例子显然是使用XLSTAT-PRO完成的,但我一直无法在网上找到任何类似的例子,可以使用MatplotLib或Pandas来做到这一点。
我本来打算尝试编写Python / MatplotLib例程,但是想知道是否有人做过类似的事情?
我非常感谢任何链接/帮助和建议。 感谢您的期待。
答案 0 :(得分:4)
您可以开始阅读this tutorial。
您可以根据自己的问题调整以下内容:
import matplotlib.pyplot as plt
import numpy as np
# your input data:
befores = np.random.rand(10)
afters = np.random.rand(10)
# plotting the points
plt.scatter(np.zeros(len(befores)), befores)
plt.scatter(np.ones(len(afters)), afters)
# plotting the lines
for i in range(len(befores)):
plt.plot( [0,1], [befores[i], afters[i]], c='k')
plt.xticks([0,1], ['before', 'after'])
plt.show()
答案 1 :(得分:2)
无法找到任何Matplotlib或Pandas示例来执行上述操作,因此我自己写了一些东西。首次尝试附加虚拟数据和解释性注释。我不是一个专业的程序员,所以对于不优雅的风格表示道歉(我总是很感谢有机会从这个论坛的专家那里学习任何提示和建设性的反馈,并分享我的代码以供任何人改进);但这个例子有效,希望它或其中的一部分对某个地方的某个人有用......
import numpy as np
import matplotlib.pyplot as plt
#--------------------------------
def points(x,y,n): # Plot n points symmetrically aligned about axes
dx=0.03 # define distance between individual dots
m = 1-(n%2) # ensure symmetrical alignment for odd or even number of dots
while(m<n):
plt.scatter(x+(dx*m),y,color = 'k', marker = 'o', s=50, zorder=1)
plt.scatter(x-(dx*m),y,color = 'k', marker = 'o', s=50, zorder=1)
m+=2
return
#--------------------------------
def histogram(b): # count number of data points in each bin
for col in range(0,2):
count = np.unique(b[:,col], return_counts=True)
for n in range(0,np.size(count[col])):
points(col,count[0][n], count[1][n])
return
#-------------------------------
def partition(a,bins): # partition continuous data into equal sized bins for plotting
lo = np.min(a)
hi = np.max(a)
rng = hi-lo
step = rng/float(bins-1)
for col in range (0,2):
for row in range (0,int(np.size(a,axis=0))):
for n in range (0,bins):
if (a[row,col] <= (lo + (step / 2) + n * step)):
b[row,col] = (lo + (n * step))
break
return(b)
#--------------------------------
def lines(b): # draw 'before' and 'after' lines between paired data points + median line
for row in range (0,int(np.size(a,axis=0))):
plt.plot([0,1],[b[row,0], b[row,1]], c='k',zorder=0, lw=1, alpha=0.3)
plt.plot ([0,1],[np.median(b[:,0]),np.median(b[:,1])],c='r',zorder=2, lw=2, alpha=1)
return
#================================
# MAIN
# Dummy paired continuous data (...or import from spreadsheet as a numpy array)\;
a = np.array([
[1.62,1.53,1.42,1.39,1.11,1.20,0.99,0.88,0.60,0.65,0.52,0.49,0.43,0.41,0.31], # before
[0.8,0.7,0.52,0.61,0.44,0.43,0.49,0.33,0.44,0.39,0.20,0.29,0.37,0.19,0.00] ]) # after
bins = 10 # choose total number of bins to categorise data into
ax=plt.axes()
a = a.transpose()
b=a # make a copy of the input data matrix to write categorised data to
b = partition(a,bins) # partition continuous data into bins
lines(b) # draw lines between mid points of each bin and draw median line
histogram(b) # draw histograms centered at mid points of each bin
# Make general tweaks to plot appearance here:
plt.xticks([0,1], ['OUT', 'IN'], fontsize=14)
plt.ylabel('stimulation threshold (mA)',fontsize=13)
plt.text(0.8,1.3,'All patients',fontsize=13)
ax.patch.set_facecolor('white') # set background colour for plot area (default = white)
ax.spines['top'].set_visible(False) # remove default upper axis
ax.spines['right'].set_visible(False) # remove default right axis
plt.tick_params(axis='both',which='both',direction = 'out',top='off', right = 'off',labeltop='off') # remove tick marks from top & right axes
plt.xlim(-0.6,1.6)
plt.ylim(0,1.7)
plt.show()