Question

我正在python中迈出第一步，我希望你能帮我解决以下问题：

我有一个清单

scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]

我想创建一个数据框，其中第1列的得分和第2列的得分频率。

任何帮助或指示表示赞赏。谢谢！

我的第一次尝试不是很好：

scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]
freq = []
df = {'col1': scores, 'col2': freq}

Answer 1

首先，创建一个Counter对象来计算每个分数的频率。

In [1]: scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]

In [2]: from collections import Counter

In [3]: score_counts = Counter(scores)

In [4]: score_counts
Out[4]: Counter({5: 12, 4: 8, 3: 4, 1: 3, 2: 3})

In [5]: import pandas as pd

In [6]: pd.DataFrame.from_dict(score_counts, orient='index')
Out[6]: 

    0
1   3
2   3
3   4
4   8
5  12

[5 rows x 1 columns]

可能使一些用户绊倒的部分是pd.DataFrame.from_dict()。文档在这里：http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.from_dict.html

Answer 2

我会使用value_counts（例如here作为系列文档）。请注意，我在这里稍微更改了一些数据：

>>> import pandas as pd
>>> scores = [1]*3 + [2]*3 + [3]*4 + [4]*1 + [5]*4
>>> pd.value_counts(scores)
5    4
3    4
2    3
1    3
4    1
dtype: int64

您可以根据需要更改输出：

>>> pd.value_counts(scores, ascending=True)
4    1
1    3
2    3
3    4
5    4
dtype: int64
>>> pd.value_counts(scores).sort_index()
1    3
2    3
3    4
4    1
5    4
dtype: int64
>>> pd.value_counts(scores).sort_index().to_frame()
   0
1  3
2  3
3  4
4  1
5  4

Answer 3

计算频率：

freq = {}
for score in scores:
     freq[score] = freq.get(score, 0) + 1

这将为您提供一个字典，其中的键映射到键值的频率。然后创建两个列，您可以创建一个字典，如：

data = {'scores': scores, 'freq': freq}

您也可以使用列表推导来完成此操作，其中列表的索引等于您的分数，值是频率，但如果您的分数范围很大，则需要一个大的稀疏数组，所以你最好使用上面的字典

Python：从列表创建一个2列数据帧并在列表上进行计算

任何帮助或指示表示赞赏。谢谢！

3 个答案: