我有一个像这样的数据框:
| A | B | C | D |
|---|---|----|---|
| 1 | 3 | 10 | 4 |
| 2 | 3 | 1 | 5 |
| 1 | 7 | 9 | 3 |
其中A B C D是类别,并且值在[1,10]范围内(某些值可能不会出现在单个列中)
我希望有一个数据框,该框针对每个类别显示这些值的计数。像这样:
| | A | B | C | D |
|----|---|----|---|---|
| 1 | 2 | 0 | 1 | 0 |
| 2 | 1 | 0 | 0 | 0 |
| 3 | 0 | 2 | 0 | 1 |
| 4 | 0 | 0 | 0 | 1 |
| 5 | 0 | 0 | 0 | 1 |
| 6 | 0 | 0 | 0 | 0 |
| 7 | 0 | 1 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 |
| 9 | 0 | 0 | 1 | 0 |
| 10 | 0 | 0 | 1 | 0 |
我尝试使用groupby
和pivot_table
,但似乎无法理解要提供哪些参数。
答案 0 :(得分:3)
pandas.Series.value_counts
应用于每一列seaborn.heatmap
将绘制DataFrame
import seaborn as sns
import pandas as pd
# dataframe setup
data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)
# create a dataframe of the counts for each column
counts = df.apply(pd.value_counts)
# display(count)
A B C D
1 2.0 NaN 1.0 NaN
2 1.0 NaN NaN NaN
3 NaN 2.0 NaN 1.0
4 NaN NaN NaN 1.0
5 NaN NaN NaN 1.0
7 NaN 1.0 NaN NaN
9 NaN NaN 1.0 NaN
10 NaN NaN 1.0 NaN
# plot
sns.heatmap(counts)
cmap
更改颜色可以改善可视化效果。
.fillna(0)
的选项1看起来没那么忙。# counts
counts = df.apply(pd.value_counts).fillna(0)
# plot
sns.heatmap(counts, cmap="GnBu", annot=True)
sns.heatmap(counts, annot=True)
答案 1 :(得分:2)
这是我第一次发布答案,希望它充满希望
import seaborn as sns
import pandas as pd
import numpy as np
data = {'A': [1, 2, 1], 'B': [3, 3, 7], 'C': [10, 1, 9], 'D': [4, 5, 3]}
df = pd.DataFrame(data)
df1 = pd.DataFrame(data = None , index = np.arange(11),columns = df.columns)
for value in df.columns:
df1[value]= df[value].value_counts()
df1.fillna(0)
答案 2 :(得分:1)
[
{
"id": 1,
"title": "course 1",
"chapter_sections": [
{
"id": 1,
"course_id": 1,
"title": "Chapter Section 1",
"chapters": []
},
{
"id": 2,
"course_id": 1,
"title": "Chapter Section 2",
"chapters": []
},
]
}
]
从数据框开始:
[
{
"id": 1,
"chapter_section_title": "chapter section 1",
"title": "chapter 1",
},
{
"id": 1,
"chapter_section_title": "chapter section 2",
"title": "chapter 2",
}
]
然后您可以做:
# necessary imports
import pandas as pd
import numpy as np
或更笼统地说:
df = pd.DataFrame({'A': [1, 2, 1],
'B': [3, 3, 7],
'C': [10,1, 9],
'D': [4, 5, 3]},
index=[0, 1, 2])
d = pd.DataFrame(0, index=np.arange(10), columns=['A','B','C','D'])
将具有所需结果的数据框结构,但具有所有值d = pd.DataFrame(0, index=np.arange(10), columns=df.columns)
。
填充数据框:
d
0
被for col in df.columns:
d[col]=df[col].value_counts()
取代。再次将它们设为0
:
NaN
这将为您提供:
0