Question

我有以下名为matches的pandas数据框：

id  |  name  |  age
1   |  a     |  19
1   |  b     |  25
2   |  c     |  19
2   |  d     |  22

如果某列（groupby）的值满足条件（count()），我会使用age + x < 21。结果将写入新列（new_col）：

matches['new_col'] = matches.groupby(['id'])['age'].transform(lambda x: x[x < 21].count())

然后数据框如下所示：

id  |  name  |  age | new_col
1   |  a     |  19  | 1
1   |  b     |  25  | 1
2   |  c     |  19  | 2
2   |  d     |  18  | 2

现在我想以更易读的方式输出结果，即每行的name - 列，其中满足条件（年龄<21）应写入新列例如，result。

我希望这样的事情（但是，可能还有其他方法可以实现这一点。甚至可以在第一步中添加new_col）：

id  |  name  |  age | new_col | result
1   |  a     |  19  | 1       | a
1   |  b     |  25  | 1       | a
2   |  c     |  19  | 2       | c,d
2   |  d     |  18  | 2       | c,d

最后一步（添加result列）是我现在被困住的地方。

Answer 1

首先按boolean indexing和aggregate过滤行，最后join为原始行：

double

使用transform >=21的另一个解决方案，但首先需要sort_values，以便matches = matches.sort_values(['id','age']) g = matches[matches.age < 21].groupby(['id'])['name'] matches['new_col'] = g.transform(len) matches['result'] = g.transform(', '.join) matches[['new_col','result']] = matches[['new_col','result']].ffill() print (matches) id name age new_col result 0 1 a 19 1 a 1 1 b 25 1 a 3 2 d 18 2 d, c 2 2 c 19 2 d, c使用ffill的值{{3}}：

sorting

更好地解释必要df稍微更改print (matches) id name age 0 1 a 25 > first value is filter out by condition 1 1 b 12 2 2 c 19 3 2 d 18 matches = matches.sort_values(['id','age']) g = matches[matches.age < 21].groupby(['id'])['name'] matches['new_col'] = g.transform(len) matches['result'] = g.transform(', '.join) matches[['new_col','result']] = matches[['new_col','result']].ffill() print (matches) id name age new_col result 1 1 b 12 1 b 0 1 a 25 1 b 3 2 d 18 2 d, c 2 2 c 19 2 d, c print (matches.sort_index()) id name age new_col result 0 1 a 25 1 b 1 1 b 12 1 b 2 2 c 19 2 d, c 3 2 d 18 2 d, c：

trait AttributeParser[T] {
  def parse(attribute: String): T
}

Answer 2

我现在这样做：groupBy + apply +添加新列的应用函数：

matches = matches.groupby(['id']).apply(concat)

concat是：

def concat(group):
    group['result'] = "{%s}" % ', '.join(group['name'][group['age'] < 21])
    return group

任何其他/更好的解决方案？

Pandas：根据条件将行转换为单列

2 个答案: