应用错误收集

我有一个pandas DataFrame（大约3000行，如下所示）：

df.index q_id total_paid choice_txt 3 176 6918 7 12 176 0 3 21 176 5053 10 30 176 5219 9 39 176 5622 8 48 176 0 7 57 176 4239 7 66 176 2811 7 75 176 5049 7 84 176 6797 3 93 176 1740 7 102 176 4747 9 111 176 882 8 120 176 5961 6 129 176 7959 6 138 176 7100 2 147 176 1565 2 156 176 5776 7 165 176 4385 1 174 176 0 10 183 176 1131 8 192 176 10586 7 201 176 920 7 210 176 1900 4 219 176 4436 1 228 176 8715 5 237 176 8248 7 246 176 0 6 255 176 7683 10 263 176 572 8 and on...

我无法通过价值计数对其进行分组。基本上我想采用这个DataFrame并按相同的'choice_txt'计数，但也按值聚合组'total_paid'。在这种情况下，我想要3个聚合（低集，假设＆lt; 3000）（中间集，3001-6000）（高集，6001 +）

在示例中，'choice_txt'是标识符，而不是上面的计数。我想计算每个标识符存在多少个标识符，在这种情况下有标识符1-10。在3000行中，每个标识符1-10都有一条记录。

将这样的最终结果分组： q_id total_paid choice_txt count 176 low 10 1 low 2 3 low 3 2 low 4 1 low 6 5 low 7 4 low 8 3 low 9 2 mid 10 1 mid 8 1 mid 1 1 high 10 1 high 2 3 high 3 2 high 4 1 high 6 5 high 7 4 high 8 3 high 9 2 到目前为止，我的工作让我可以通过实际总付费价值将其分组： sub_grouped_df = filtered_df.groupby([series_col, breakout_column, "choice_txt"]).count()['another_column'] q_id total_paid choice_txt 176 0 10 1 2 3 3 2 4 1 6 5 7 4 8 3 9 2 56 10 1 236 8 1 455 1 1 572 8 1 609 8 1 636 7 1 826 10 1 ... 176 9096 4 1 9141 5 1 9232 5 1 9357 8 1 9371 3 1 9601 1 1 9604 2 1 9706 8 1 9719 1 1 10032 1 1 10490 9 1 10586 7 1 10632 3 1 10799 4 1 12437 1 1

我有我需要的分组，除了我需要取'total_paid'并按值分组（再次是上面提到的低/中/高集。

但我无法弄清楚如何按值评估total_paid col并将其用于组中。

编辑：好的，我有一个有效的解决方案，但它似乎完全是黑客攻击：由于我需要累积3组（低，中，高），因此我的主数据帧按低，中，高标准分成3个独立的数据帧： low_df = filtered_df.loc[filtered_df[breakout_column] <= low_high] high_df = filtered_df.loc[filtered_df[breakout_column] > mid_high] mid_df = filtered_df.drop(low_df.index) mid_df = mid_df.drop(high_df.index) 然后我在每一个上进行组合以获得我需要的分组： sub_low_df = low_df.groupby([series_col, "choice_txt"]) sub_mid_df = mid_df.groupby([series_col, "choice_txt"]) sub_high_df = high_df.groupby([series_col, "choice_txt"]) 最后，我依次将我的结果组合成一个dict报告。这是一个可接受的解决方案还是有更好的方法来实现结果？

Python Pandas - 按聚合分组（条件值的数量）

0 个答案: