Question

我第一次尝试大熊猫。我有一个包含两列的数据框：user_id和string。每个user_id可能有多个字符串，因此会多次显示在数据帧中。我想从中导出另一个数据帧;只列出那些与user_ids至少相关联的strings df[df['user_id'].value_counts()> 1]的人。

我尝试IndexingError: Unalignable boolean Series key provided，我认为这是执行此操作的标准方法，但它会产生MsgBox。有人可以清除我的概念并提供正确的替代方案吗？

Answer 1

我认为您需要transform，因为需要与index相同的df掩码。但是，如果更改使用value_counts index，则会引发错误。

df[df.groupby('user_id')['user_id'].transform('size') > 1]

Answer 2

l2 =（（（df.val1.loc [df.val =='Best']。value_counts（）。sort_index（）/ df.val1.loc [df.val.isin（l11）]。value_counts（） .sort_index（）））。loc [lambda x：x> 0.5] .index.tolist（）

Answer 3

您只需执行以下操作

col = 'column_name'   # name of the column that you consider
n = 10                # how many occurrences expected to be appeared

df = df[df.groupby(col)[col].transform('count').ge(n)]

这应该根据需要过滤数据框

Answer 4

我也遇到了相同的挑战并使用：

if (MAC or LINUX) and os.environ.get('SUDO_UID', None) is not None:

信用：blog.softhints

根据列value_counts（pandas）过滤数据帧

4 个答案: