Question

熊猫新手，所以请忍受我。

我有一个文本处理功能，我想在我的datafame中的某个列上运行，但要视另一列的值而定。我看过

取决于是否标记了某些内容，我想在其上运行翻译功能。

   account  article    ... translation  flag
0    123      text      ...               1
1    123      text      ...               0
2    123      text      ...               1

我尝试过：

df['translation'] = df[['flag', 'text']].apply(lambda x: translate(['article']) if ['flag'] == 1 else None)

并获得此回报：

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index flag')

任何帮助或指导将不胜感激。

Answer 1

IIUC，您可以尝试map和where

df['translation'] = df['article'].map(translate).where(df['flag'].eq(1), None)

Answer 2

我使用了与您相似的测试DataFrame，没有 translation 列：

   account article  flag
0      123   text1     1
1      123   text2     0
2      123   text3     1

然后我定义了一个“代理”截断函数：

def translate(txt):
    return '_' + txt + '_'

要有条件地调用它，请运行：

df['translation'] = df.apply(lambda row:
    translate(row.article) if row.flag == 1 else None, axis=1)

结果是：

   account article  flag translation
0      123   text1     1     _text1_
1      123   text2     0        None
2      123   text3     1     _text3_

您的代码有什么问题：

如果要将源数据限制为列的子集，请使用 existing 列名称（ article 代替 text ），并包括所有使用的列在应用的功能中。
lambda函数应用于每个行，因此您应该已经通过 axis = 1 参数（默认 axis 为 0 ）。
调用函数时，当前的行作为参数传递（ x ），但是要引用其中的某些列，您应该使用 x.column_name 符号。例如。我的解决方案也可能是：
```
df[['article', 'flag']].apply(lambda row:
    translate(row.article) if row.flag == 1 else None, axis=1)
```
像 ['article'] 这样的参数在这里只是一个列表，其中包含一个字词（文章）。我怀疑您的翻译功能是否能够处理列表参数。
关于 if ['flag'] ... 的类似评论。这不是不是参考到源行中的列。

根据其他列中的值将函数应用于列中数据框行的子集

2 个答案: