Question

我的目标是比较两列并添加结果列。 R使用ifelse，但我需要知道熊猫的方式。

- [R

> head(mau.payment)
  log_month user_id install_month payment
1   2013-06       1       2013-04       0
2   2013-06       2       2013-04       0
3   2013-06       3       2013-04   14994

> mau.payment$user.type <-ifelse(mau.payment$install_month == mau.payment$log_month, "install", "existing")
> head(mau.payment)
  log_month user_id install_month payment user.type
1   2013-06       1       2013-04       0  existing
2   2013-06       2       2013-04       0  existing
3   2013-06       3       2013-04   14994  existing
4   2013-06       4       2013-04       0  existing
5   2013-06       6       2013-04       0  existing
6   2013-06       7       2013-04       0  existing

熊猫

>>> maupayment
user_id  log_month  install_month
1        2013-06    2013-04              0
         2013-07    2013-04              0
2        2013-06    2013-04              0
3        2013-06    2013-04          14994

我尝试了一些案例，但没有奏效。似乎字符串比较不起作用。

>>>np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')

TypeError: 'str' object cannot be interpreted as an integer

你能帮我吗？

Pandas and numpy version。

>>> pd.version.version
'0.16.2'
>>> np.version.full_version
'1.9.2'

更新版本后，它有效！

>>> np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
array(['existing', 'install', 'existing', ..., 'install', 'install',
       'install'], 
      dtype='<U8')

Answer 1

您必须将pandas升级到上一版本，因为在版本0.17.1中它非常有用。

示例（列install_month中的第一个值已更改以进行匹配）：

print maupayment
  log_month  user_id install_month  payment
1   2013-06        1       2013-06        0
2   2013-06        2       2013-04        0
3   2013-06        3       2013-04    14994

print np.where(maupayment['log_month'] == maupayment['install_month'], 'install', 'existing')
['install' 'existing' 'existing']

Answer 2

一种选择是将匿名功能与 Pandas的应用功能结合使用：

在函数中设置一些分支逻辑：

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js"></script>
<input type="hidden" class="hidden" name="a" value="" data-init />
<input type="hidden" class="hidden" name="b" value="" data-init/>

这将从lambda中获取 x （请参见下文），要查找的内容的列表，是标签和< strong>没有标签。

例如，假设我们正在查看IMDB数据集（imdb_df）：

...并且我想添加一个名为“ new_rating”的新列，以显示电影是否成熟。

我可以将熊猫 apply 函数与上面的分支逻辑一起使用：

def if_this_else_that(x, list_of_checks, yes_label, no_label):
    if x in list_of_checks:
        res = yes_label
    else: 
        res = no_label
    return(res)

有时候，我们需要将此功能与另一张支票相结合。例如，IMDB数据集中的某些条目为 NaN 。我可以同时检查NaN和成熟度评级，如下所示：

imdb_df['new_rating'] = imdb_df['Rated'].apply(lambda x: if_this_else_that(x, ['PG', 'PG-13'], 'not mature', 'mature'))

在这种情况下，我的NaN首先被转换为字符串，但是显然您也可以使用真正的NaN进行此操作。

Python / Pandas中R / ifelse的等价物？比较字符串列？

2 个答案: