每列中每个值的计数热图

时间:2020-09-05 19:01:42

标签: python pandas dataframe pivot-table

我有一个像这样的数据框:

| A | B | C  | D |  
|---|---|----|---|  
| 1 | 3 | 10 | 4 |  
| 2 | 3 | 1  | 5 |  
| 1 | 7 | 9  | 3 |  

其中A B C D是类别,并且值在[1,10]范围内(某些值可能不会出现在单个列中)

我希望有一个数据框,该框针对每个类别显示这些值的计数。像这样:

|    | A | B  | C | D |
|----|---|----|---|---|  
| 1  | 2 | 0  | 1 | 0 |
| 2  | 1 | 0  | 0 | 0 |
| 3  | 0 | 2  | 0 | 1 |
| 4  | 0 | 0  | 0 | 1 |
| 5  | 0 | 0  | 0 | 1 |
| 6  | 0 | 0  | 0 | 0 |
| 7  | 0 | 1  | 0 | 0 |
| 8  | 0 | 0  | 0 | 0 |
| 9  | 0 | 0  | 1 | 0 |
| 10 | 0 | 0  | 1 | 0 | 

我尝试使用groupbypivot_table,但似乎无法理解要提供哪些参数。

3 个答案:

答案 0 :(得分:3)

选项1

import seaborn as sns
import pandas as pd

# dataframe setup
data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)

# create a dataframe of the counts for each column
counts = df.apply(pd.value_counts)

# display(count)
      A    B    C    D
1   2.0  NaN  1.0  NaN
2   1.0  NaN  NaN  NaN
3   NaN  2.0  NaN  1.0
4   NaN  NaN  NaN  1.0
5   NaN  NaN  NaN  1.0
7   NaN  1.0  NaN  NaN
9   NaN  NaN  1.0  NaN
10  NaN  NaN  1.0  NaN

# plot
sns.heatmap(counts)

enter image description here

选项2

  • 热图有许多样式选项,用cmap更改颜色可以改善可视化效果。
  • 我认为没有.fillna(0)的选项1看起来没那么忙。
# counts
counts = df.apply(pd.value_counts).fillna(0)

# plot
sns.heatmap(counts, cmap="GnBu", annot=True)

enter image description here

默认颜色

sns.heatmap(counts, annot=True)

enter image description here

答案 1 :(得分:2)

这是我第一次发布答案,希望它充满希望

import seaborn as sns
import pandas as pd
import numpy as np

data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)

df1 = pd.DataFrame(data = None , index = np.arange(11),columns = df.columns) 

for value in df.columns:
    df1[value]= df[value].value_counts()    
df1.fillna(0)

答案 2 :(得分:1)

[
    {
        "id": 1,
        "title": "course 1",
        "chapter_sections": [
            {
                "id": 1,
                "course_id": 1,
                "title": "Chapter Section 1",
                "chapters": []
            },
            {
                "id": 2,
                "course_id": 1,
                "title": "Chapter Section 2",
                "chapters": []
            },
        ]
    }
]

从数据框开始:

[
        {
            "id": 1,
            "chapter_section_title": "chapter section 1",
            "title": "chapter 1",
        },
        {
            "id": 1,
            "chapter_section_title": "chapter section 2",
            "title": "chapter 2",
        }
    ]

然后您可以做:

# necessary imports
import pandas as pd
import numpy as np

或更笼统地说:

df = pd.DataFrame({'A': [1, 2, 1],
                   'B': [3, 3, 7],
                   'C': [10,1, 9],
                   'D': [4, 5, 3]},
                   index=[0, 1, 2])

d = pd.DataFrame(0, index=np.arange(10), columns=['A','B','C','D']) 将具有所需结果的数据框结构,但具有所有值d = pd.DataFrame(0, index=np.arange(10), columns=df.columns)

填充数据框:

d

0for col in df.columns: d[col]=df[col].value_counts() 取代。再次将它们设为0

NaN

这将为您提供:

0