在python中使用分类的x和y轴绘制数据

时间:2018-06-28 20:36:46

标签: python pandas matplotlib seaborn

我有一个病例和对照样品清单,以及每个样品中存在或不存在哪些特征的信息。熊猫可以生成包含信息的数据框:

import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

我需要将此数据可视化为点图/散点图,以使x轴和y轴都可以分类,并通过不同的形状来编码是否存在。如下所示:

Patient|  x      x     -
Control|  -      x     -  
       __________________
        GeneA  GeneB  GeneC

我是Matplotlib / seaborn的新手,可以绘制简单的线图和散点图。但是在网上搜索时,我找不到任何与此处所需内容相似的说明或图解。

3 个答案:

答案 0 :(得分:5)

一种快速的方法是:

import pandas as pd
import matplotlib.pyplot as plt

df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')  
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present']) 

enter image description here

感谢@DEEPAK SURANA在颜色栏上添加标签。

答案 1 :(得分:2)

类似的事情可能会起作用

import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator

df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)

看看https://matplotlib.org/examples/pylab_examples/major_minor_demo1.htmlhttps://matplotlib.org/api/ticker_api.html

我认为您必须将布尔值转换为零和一才能使其工作。像df.astype(int)

答案 2 :(得分:2)

我搜索了pyplot文档,但找不到与您所描述的完全相同的散点图或点图。这是我创建一个说明您想要的图的想法。 True记录为蓝色,False记录为红色。

# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
    'Control':[False,True,False]} 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)

# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
    df_gene = df[[gene, 'level']]
    cList = ['blue' if x == True else 'red' for x in df[gene]]
    for inr_idx, lv in enumerate(df['level']):
        ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()

Figure 1