我有一个病例和对照样品清单,以及每个样品中存在或不存在哪些特征的信息。熊猫可以生成包含信息的数据框:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
我需要将此数据可视化为点图/散点图,以使x轴和y轴都可以分类,并通过不同的形状来编码是否存在。如下所示:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
我是Matplotlib / seaborn的新手,可以绘制简单的线图和散点图。但是在网上搜索时,我找不到任何与此处所需内容相似的说明或图解。
答案 0 :(得分:5)
一种快速的方法是:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
感谢@DEEPAK SURANA在颜色栏上添加标签。
答案 1 :(得分:2)
类似的事情可能会起作用
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
看看https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html 和https://matplotlib.org/api/ticker_api.html
我认为您必须将布尔值转换为零和一才能使其工作。像df.astype(int)
答案 2 :(得分:2)
我搜索了pyplot文档,但找不到与您所描述的完全相同的散点图或点图。这是我创建一个说明您想要的图的想法。 True
记录为蓝色,False
记录为红色。
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()