Question

我的数据框如下：

       Domain         URL                               Importance
1      google.com     google.com/example/1/file.exe     1
2      microsoft.com  microsoft.com/example/1/file.exe  3
3      apple.com      apple.com/example/1/file.exe      4
4      google.com     google.com/example/2/file.exe     1
5      google.com     google.com/example/3/file.exe     2
6      apple.com      apple.com/example/2/file.exe      3
...    ...            ...                               ...
1000   google.com     google.com/example/500/file.exe   2

所有URL始终都是唯一的，但是与域重叠。重要性级别已预先分配给数据框中的行，其中1为最重要，4为最低优先级。

我正在寻找一种很好的“泛型”方式来过滤数据帧，以便每个域最多可以有50个url，按最高重要性级别排序（1> 4）。然后总共有750个网址，在切断底部之前再次按重要性级别进行了过滤。

熊猫限制重复结果

0 个答案: