绘制2D数据:使用不同颜色图的热图

时间:2014-08-06 06:56:05

标签: python r pandas matplotlib heatmap

我想要可视化我拥有的2D数据。例如,以下是具有四个属性的数据:

       att1  att2   att3
fun1     10     0      2
fun2      0     1      3
fun3      1    10      5
fun4      2     3     10

我想为每个数据点分配不同的颜色。颜色的强度取决于该列中属性的值,每列必须具有不同的颜色。

以下是所需的图像:

enter image description here

有没有人知道如何在Python或R中制作它?

2 个答案:

答案 0 :(得分:6)

使用Python:

我找到了一个更好的方法:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

# data loading
df = pd.read_csv("file.csv", index_col=0) 


# plotting
fig,ax = plt.subplots()
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att1")), 
           cmap=cm.Reds) # You can change the colormap here
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att2")), 
           cmap=cm.Greens)
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att3")), 
           cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)
plt.show()

hm

一些细节:

df.mask(((df == df) | df.isnull()) & (df.columns != "att1"))
      att1  att2  att3
fun1    10   NaN   NaN
fun2     0   NaN   NaN
fun3     1   NaN   NaN
fun4     2   NaN   NaN

旧版本,带有numpy蒙面数组:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from numpy.ma import masked_array
import numpy as np

df = pd.read_clipboard() # just copied your example

# define masked arrays to mask all but the given column
c1 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][0]))) 
c2 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][1])))
c3 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][2])))

fig,ax = plt.subplots()
ax.matshow(c1,cmap=cm.Reds) # You can change the colormap here
ax.matshow(c2,cmap=cm.Greens)
ax.matshow(c3,cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)

一些细节:

df是一个数据帧:

      att1  att2  att3
fun1    10     0     2
fun2     0     1     3
fun3     1    10     5
fun4     2     3    10

c1,c2,c3是屏蔽数组(第1,2和3列):

>>> c1
masked_array(data =
 [[10 -- --]
 [0 -- --]
 [1 -- --]
 [2 -- --]],
             mask =
 [[False  True  True]
 [False  True  True]
 [False  True  True]
 [False  True  True]],
       fill_value = 999999)

或者,您可以从numpy 2D数组开始:

>> data
array([[10,  0,  2],
       [ 0,  1,  3],
       [ 1, 10,  5],
       [ 2,  3, 10]])

并将所有dfdf.values替换为data(2D数组),标签部分除外。

答案 1 :(得分:0)

尝试:

ddf = structure(list(fn = structure(1:4, .Label = c("fun1", "fun2", 
"fun3", "fun4"), class = "factor"), att1 = c(10L, 0L, 1L, 2L), 
    att2 = c(0L, 1L, 10L, 3L), att3 = c(2L, 3L, 5L, 10L)), .Names = c("fn", 
"att1", "att2", "att3"), class = "data.frame", row.names = c(NA, 
-4L))

ddf
    fn att1 att2 att3
1 fun1   10    0    2
2 fun2    0    1    3
3 fun3    1   10    5
4 fun4    2    3   10

ddfm = melt(ddf)
ddfm
     fn variable value
1  fun1     att1    10
2  fun2     att1     0
3  fun3     att1     1
4  fun4     att1     2
5  fun1     att2     0
6  fun2     att2     1
7  fun3     att2    10
8  fun4     att2     3
9  fun1     att3     2
10 fun2     att3     3
11 fun3     att3     5
12 fun4     att3    10
> 
ggplot(ddfm)+ geom_tile(aes(x=variable, y=fn, fill=value))

enter image description here