Question

您好我在pandas中有一个表（请参见截图中的摘录 - 它还有更多行）并希望提取唯一的“author_id”，然后运行一个函数来提取与每个ID相关的详细信息。

我通过以下方式提取唯一ID列表：

unique_ids = df['author_id'].unique()

然后我尝试运行：

df['author_id'].unique().apply(some_function)

其中'some_function'采用'author_id'并返回一些信息。但是我得到了错误：

AttributeError: 'numpy.ndarray' object has no attribute 'apply'

所以我诉诸：

[unique_ids中的author_id的[some_function（author_id）]

哪个有效，但不是这样做的有效/矢量化方式。

以矢量化方式执行此操作的方法是什么？

提前致谢！ enter image description here

Answer 1

我想你想做一个groupby：

g = df.groupby('author_id')

g.apply(some_function)

Answer 2

唯一函数的输出是一个numpy数组，它不提供apply方法。您可以通过该数组创建Series，然后应用您的函数：

pd.Series(df['author_id'].unique()).apply(some_function)