`R` / `dplyr`：

Question

我是一名试图进入Python的R程序员。在R中，当我想有条件地改变列时，我使用：

ref.where('NAME', '>=', firstletter).where('NAME', '<=', firstletter+'\uf8ff'))

在Python中，如何有条件地改变列值？这是我最低限度可重复的例子：

col = dplyr::mutate(col, ifelse(condition, if_true(x), if_false(x))

Answer 1

您可以使用条件（及其否定）进行逻辑索引：

has_abc = cntnt.str.contains("abc")
cntnt[ has_abc].apply(do_thing)
cntnt[~has_abc].apply(do_other_thing)

Answer 2

我认为你所寻找的是assign，它基本上就是mutate中与dplyr相当的熊猫。您的条件语句可以使用列表推导或使用矢量化方法编写（参见下文）。

举一个示例数据框，我们称之为df：

> df
             a
1   0.50212013
2   1.01959213
3  -1.32490344
4  -0.82133375
5   0.23010548
6  -0.64410737
7  -0.46565442
8  -0.08943858
9   0.11489957
10 -0.21628132

`R` / `dplyr`：

在R中，您可以mutate与ifelse一起根据条件创建列（在此示例中，当列a更大时，它将为'pos'比0）：

df = dplyr::mutate(df, col = ifelse(df$a > 0, 'pos', 'neg'))

结果df：

> df
             a col
1   0.50212013 pos
2   1.01959213 pos
3  -1.32490344 neg
4  -0.82133375 neg
5   0.23010548 pos
6  -0.64410737 neg
7  -0.46565442 neg
8  -0.08943858 neg
9   0.11489957 pos
10 -0.21628132 neg

`Python` / `Pandas`

在pandas中，将assign与列表理解结合使用：

df = df.assign(col = ['pos' if a > 0 else 'neg' for a in df['a']])

结果df：

>>> df
          a  col
0  0.502120  pos
1  1.019592  pos
2 -1.324903  neg
3 -0.821334  neg
4  0.230105  pos
5 -0.644107  neg
6 -0.465654  neg
7 -0.089439  neg
8  0.114900  pos
9 -0.216281  neg

您在ifelse中使用的R被list comprehension取代。

对此的变化：

您没有使用assign：您可以直接在df上创建新列，而无需根据需要创建副本：

df['col'] = ['pos' if a > 0 else 'neg' for a in df['a']]

此外，您可以使用numpy的矢量化方法之一来代替列表推导，例如，np.select：

import numpy as np
df['col'] = np.select([df['a'] > 0], ['pos'], 'neg')
# or
df = df.assign(col = np.select([df['a'] > 0], ['pos'], 'neg'))

有条件地突变列

2 个答案:

`R` / `dplyr`：

`Python` / `Pandas`

对此的变化：

有条件地突变列

2 个答案:

R / dplyr：

Python / Pandas

对此的变化：

`R` / `dplyr`：

`Python` / `Pandas`