我在编码方面相当新(完全自学成才),并且在我作为癌症实验室研究助理的工作中开始使用它。我需要一些帮助在matplot lab中设置一些折线图。
我有一个数据集,其中包括约80名患者的nextgen测序数据。对于每位患者,我们有不同的分析时间点,检测到不同的基因(40个),以及该基因的相关%突变。
我的目标是编写两个脚本,一个将生成一个"由患者"该图将是具有y-%突变,x-测量时间的线图,并且对于由每个患者的相关基因产生的所有线将具有不同的颜色线。第二个图将是" by gene",其中我将有一个图包含不同的颜色线,代表该特定基因的每个不同患者的x / y值。
以下是上述脚本的1个编号的示例数据框:
gene yaxis xaxis pt# gene#
ASXL1-3 34 1 3 1
ASXL1-3 0 98 3 1
IDH1-3 24 1 3 11
IDH1-3 0 98 3 11
RUNX1-3 38 1 3 21
RUNX1-3 0 98 3 21
U2AF1-3 33 1 3 26
U2AF1-3 0 98 3 26
我已经设置了一个groupby脚本,当我迭代它时,为每个患者的每个基因时间点提供一个数据帧。
grouped = df.groupby('pt #')
for groupObject in grouped:
group = groupObject[1]
对于患者1,这给出了以下输出:
y x gene patientnumber patientgene genenumber dxtotransplant \
0 40.0 1712 ASXL1 1 ASXL1-1 1 1857
1 26.0 1835 ASXL1 1 ASXL1-1 1 1857
302 7.0 1835 RUNX1 1 RUNX1-1 21 1857
我需要帮助编写一个脚本来创建上述任何一个图。使用bypatient示例,我的一般想法是我需要为患者的每个基因创建一个不同的子图,其中每个子图是由该基因代表的线图。
使用matplotlib这就是我所得到的:
plt.figure()
grouped = df.groupby('patient number')
for groupObject in grouped:
group = groupObject[1]
df = group #may need to remove this
for element in range(len(group)):
xs = np.array(df[df.columns[1]]) #"x" column
ys= np.array(df[df.columns[0]]) #"y" column
gene = np.array(df[df.columns[2]])[element] #"gene" column
plt.subplot(1,1,1)
plt.scatter(xs,ys, label=gene)
plt.plot(xs,ys, label=gene)
plt.legend()
plt.show()
这会产生以下输出:
在此输出中,圆圈线不应连接到其他2个点。在这种情况下,这是患者1,具有以下数据点:
x y gene
1712 40 ASXL1
1835 26 ASXL1
1835 7 RUNX1
使用seaborn我已经使用此代码接近我想要的图形:
grouped = df.groupby(['patientnumber'])
for groupObject in grouped:
group = groupObject[1]
g = sns.FacetGrid(group, col="patientgene", col_wrap=4, size=4, ylim=(0,100))
g = g.map(plt.scatter, "x", "y", alpha=0.5)
g = g.map(plt.plot, "x", "y", alpha=0.5)
plt.title= "gene:%s"%element
使用此代码我得到以下内容:
如果我调整行:
g = sns.FacetGrid(group, col="patientnumber", col_wrap=4, size=4, ylim=(0,100))
我得到以下结果:
正如您在第二个示例中所看到的,该图正在处理我的绘图中的每个点,就好像它们来自同一行(但实际上它们是4个单独的行)。
我如何调整我的迭代,以便每个患者基因在同一图表上被视为一个单独的行?
答案 0 :(得分:1)
我写了一个subplot函数,可以帮到你。我修改了数据以帮助说明绘图功能。
gene,yaxis,xaxis,pt #,gene #
ASXL1-3,34,1,3,1
ASXL1-3,3,98,3,1
IDH1-3,24,1,3,11
IDH1-3,7,98,3,11
RUNX1-3,38,1,3,21
RUNX1-3,2,98,3,21
U2AF1-3,33,1,3,26
U2AF1-3,0,98,3,26
ASXL1-3,39,1,4,1
ASXL1-3,8,62,4,1
ASXL1-3,0,119,4,1
IDH1-3,27,1,4,11
IDH1-3,12,62,4,11
IDH1-3,1,119,4,11
RUNX1-3,42,1,4,21
RUNX1-3,3,62,4,21
RUNX1-3,1,119,4,21
U2AF1-3,16,1,4,26
U2AF1-3,1,62,4,26
U2AF1-3,0,119,4,26
这是子绘图功能......带有一些额外的铃声和口哨声:)
def plotByGroup(df, group, xCol, yCol, title = "", xLabel = "", yLabel = "", lineColors = ["red", "orange", "yellow", "green", "blue", "purple"], lineWidth = 2, lineOpacity = 0.7, plotStyle = 'ggplot', showLegend = False):
"""
Plot multiple lines from a Pandas Data Frame for each group using DataFrame.groupby() and MatPlotLib PyPlot.
@params
df - Required - Data Frame - Pandas Data Frame
group - Required - String - Column name to group on
xCol - Required - String - Column name for X axis data
yCol - Required - String - Column name for y axis data
title - Optional - String - Plot Title
xLabel - Optional - String - X axis label
yLabel - Optional - String - Y axis label
lineColors - Optional - List - Colors to plot multiple lines
lineWidth - Optional - Integer - Width of lines to plot
lineOpacity - Optional - Float - Alpha of lines to plot
plotStyle - Optional - String - MatPlotLib plot style
showLegend - Optional - Boolean - Show legend
@return
MatPlotLib Plot Object
"""
# Import MatPlotLib Plotting Function & Set Style
from matplotlib import pyplot as plt
matplotlib.style.use(plotStyle)
figure = plt.figure() # Initialize Figure
grouped = df.groupby(group) # Set Group
i = 0 # Set iteration to determine line color indexing
for idx, grp in grouped:
colorIndex = i % len(lineColors) # Define line color index
lineLabel = grp[group].values[0] # Get a group label from first position
xValues = grp[xCol] # Get x vector
yValues = grp[yCol] # Get y vector
plt.subplot(1,1,1) # Initialize subplot and plot (on next line)
plt.plot(xValues, yValues, label = lineLabel, color = lineColors[colorIndex], lw = lineWidth, alpha = lineOpacity)
# Plot legend
if showLegend:
plt.legend()
i += 1
# Set title & Labels
axis = figure.add_subplot(1,1,1)
axis.set_title(title)
axis.set_xlabel(xLabel)
axis.set_ylabel(yLabel)
# Return plot for saving, showing, etc.
return plt
并使用它......
import pandas
# Load the Data into Pandas
df = pandas.read_csv('data.csv')
#
# Plotting - by Patient
#
# Create Patient Grouping
patientGroup = df.groupby('pt #')
# Iterate Over Groups
for idx, patientDF in patientGroup:
# Let's give them specific titles
plotTitle = "Gene Frequency over Time by Gene (Patient %s)" % str(patientDf['pt #'].values[0])
# Call the subplot function
plot = plotByGroup(patientDf, 'gene', 'xaxis', 'yaxis', title = plotTitle, xLabel = "Days", yLabel = "Gene Frequency")
# Add Vertical Lines at Assay Timepoints
timepoints = set(patientDf.xaxis.values)
[plot.axvline(x = timepoint, linewidth = 1, linestyle = "dashed", color='gray', alpha = 0.4) for timepoint in timepoints]
# Let's see it
plot.show()
当然,我们可以通过基因做同样的事情。
#
# Plotting - by Gene
#
# Create Gene Grouping
geneGroup = df.groupby('gene')
# Generate Plots for Groups
for idx, geneDF in geneGroup:
plotTitle = "%s Gene Frequency over Time by Patient" % str(geneDf['gene'].values[0])
plot = plotByGroup(geneDf, 'pt #', 'xaxis', 'yaxis', title = plotTitle, xLab = "Days", yLab = "Frequency")
plot.show()
如果这不是您正在寻找的内容,请提供一个澄清说明,我会再次采取行动。