Question

我想要可视化我拥有的2D数据。例如，以下是具有四个属性的数据：

       att1  att2   att3
fun1     10     0      2
fun2      0     1      3
fun3      1    10      5
fun4      2     3     10

我想为每个数据点分配不同的颜色。颜色的强度取决于该列中属性的值，每列必须具有不同的颜色。

以下是所需的图像：

enter image description here

有没有人知道如何在Python或R中制作它？

Answer 1

使用Python：

我找到了一个更好的方法：

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm

# data loading
df = pd.read_csv("file.csv", index_col=0) 


# plotting
fig,ax = plt.subplots()
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att1")), 
           cmap=cm.Reds) # You can change the colormap here
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att2")), 
           cmap=cm.Greens)
ax.matshow(df.mask(((df == df) | df.isnull()) & (df.columns != "att3")), 
           cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)
plt.show()

一些细节：

df.mask(((df == df) | df.isnull()) & (df.columns != "att1"))
      att1  att2  att3
fun1    10   NaN   NaN
fun2     0   NaN   NaN
fun3     1   NaN   NaN
fun4     2   NaN   NaN

旧版本，带有numpy蒙面数组：

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from numpy.ma import masked_array
import numpy as np

df = pd.read_clipboard() # just copied your example

# define masked arrays to mask all but the given column
c1 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][0]))) 
c2 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][1])))
c3 = masked_array(df, mask=(np.ones_like(df)*(df.values[0]!=df.values[0][2])))

fig,ax = plt.subplots()
ax.matshow(c1,cmap=cm.Reds) # You can change the colormap here
ax.matshow(c2,cmap=cm.Greens)
ax.matshow(c3,cmap=cm.Blues)
plt.xticks(range(3), df.columns)
plt.yticks(range(4), df.index)

一些细节：

df是一个数据帧：

      att1  att2  att3
fun1    10     0     2
fun2     0     1     3
fun3     1    10     5
fun4     2     3    10

c1，c2，c3是屏蔽数组（第1,2和3列）：

>>> c1
masked_array(data =
 [[10 -- --]
 [0 -- --]
 [1 -- --]
 [2 -- --]],
             mask =
 [[False  True  True]
 [False  True  True]
 [False  True  True]
 [False  True  True]],
       fill_value = 999999)

或者，您可以从numpy 2D数组开始：

>> data
array([[10,  0,  2],
       [ 0,  1,  3],
       [ 1, 10,  5],
       [ 2,  3, 10]])

并将所有df和df.values替换为data（2D数组），标签部分除外。

Answer 2

尝试：

ddf = structure(list(fn = structure(1:4, .Label = c("fun1", "fun2", 
"fun3", "fun4"), class = "factor"), att1 = c(10L, 0L, 1L, 2L), 
    att2 = c(0L, 1L, 10L, 3L), att3 = c(2L, 3L, 5L, 10L)), .Names = c("fn", 
"att1", "att2", "att3"), class = "data.frame", row.names = c(NA, 
-4L))

ddf
    fn att1 att2 att3
1 fun1   10    0    2
2 fun2    0    1    3
3 fun3    1   10    5
4 fun4    2    3   10

ddfm = melt(ddf)
ddfm
     fn variable value
1  fun1     att1    10
2  fun2     att1     0
3  fun3     att1     1
4  fun4     att1     2
5  fun1     att2     0
6  fun2     att2     1
7  fun3     att2    10
8  fun4     att2     3
9  fun1     att3     2
10 fun2     att3     3
11 fun3     att3     5
12 fun4     att3    10
> 
ggplot(ddfm)+ geom_tile(aes(x=variable, y=fn, fill=value))

enter image description here

绘制2D数据：使用不同颜色图的热图

2 个答案: