Question

我有大约150,000行数据，详细说明按域，电子邮件模板，退回类型和每天的计数的电子邮件退回。它的格式如下：

+--------+-------------+-----------------+-------+---------+-------+
|   t    | bounce_type |    source_ip    |  tid  |  emld   | count |
+--------+-------------+-----------------+-------+---------+-------+
| 1/1/15 | hard        | 199.122.255.142 | 10033 | aol.com |     4 |
+--------+-------------+-----------------+-------+---------+-------+

最简单的方法是从所有源ips和所有tid中仅选择具有＆＃34; aol.com＆＃34;，＆＃34; hard＆＃34;的反弹行的行？这是我要创建一个函数并传递数据帧的东西，还是有一个更简单的操作来按这些条件过滤数据？

Answer 1

一种简单的方法是执行蒙版，假设您的DataFrame名为df，它将是这样的：

masked = (df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')
# then the result will be
df[masked]

一行中的缩写版本：

df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')]

要返回source_ip和tids列：

df[masked][['source_ip', 'tids']]

或者，

df[(df['emld'] == 'aol.com') & (df['bounce_type'] == 'hard')][['source_ip', 'tids']]

希望这有帮助。

如何按许多标准拆分pandas数据框

1 个答案: