Question

我正在尝试使用Python获取csv列中唯一项的计数。

我有许多CSV文件。每个CSV文件包含5列（无标题）：

'AB', 'asd', 'asd2', 'asd3', 'asd4'
'AB', 'asd', 'asd2', 'asd3', 'asd4'
'AB', 'poi', 'poi2', 'poi3', 'poi4'
'BG', 'put', 'put2', 'put3', 'put4'
'BG', 'asd', 'asd2', 'asd3', 'asd4'
'BG', 'poi', 'poi2', 'poi3', 'poi4'

我想要从每个文件中获取前两列

'AB', 'asd'
'AB', 'asd'
'AB', 'poi'
'BG', 'put'
'BG', 'asd'
'BG', 'poi'

然后基于1列对第2列唯一项进行计数。因此结果应为：

'AB': 2   # AB has unique values 'asd' and 'poi'
'BG': 3   # BG has unique vales 'put', 'asd' and 'poi'

Answer 1

如果您可以使用第三方库，那么一个不错的选择是使用pandas.read_csv()。

这将为您提供一个<body> some code </body> body{ background:url('../../Images/newback.svg'); background-repeat: no-repeat; background-position: center; }，您可以在其中选择所需的列，然后使用.value_counts()。

这看起来像：

pandas.DataFrame

如何计算基于列的唯一项？

1 个答案: