Question

我有这个数据框

      name               year ...
0     Carlos - xyz       2019
1     Marcos - yws       2031
3     Fran - xxz         2431
4     Matt - yre         1985
...

我想创建一个名为type的新列。如果此人的姓名以“xyz”或“xxz”结尾，我希望类型为“大”

所以，它应该是这样的：

      name               year   type
0     Carlos - xyz       2019    big
1     Marcos - yws       2031  
3     Fran - xxz         2431    big
4     Matt - yre         1985
...

有什么建议吗？

Answer 1

选项1
使用<div class="nav container"> <div class="row"> <div class="nav__item col-sm-2">…</div> <div class="nav__item col-sm-2">…</div> <div class="nav__item col-sm-2">…</div> <div class="nav__item col-sm-2">…</div> <div class="nav__item col-sm-2 col-sm-offset-2">…</div> </div> </div>生成掩码：

str.contains

或者，

m = df.name.str.contains(r'x[yx]z$')

现在，您可以使用sub_str = ['xyz', 'xxz'] m = df.name.str.contains(r'{}$'.format('|'.join(sub_str)))

创建列

np.where

或者，df['type'] = np.where(m, 'big', '')代替loc;

np.where

df['type'] = ''
df.loc[m, 'type'] = 'big'

选项2
作为替代方案，请考虑df name year type 0 Carlos - xyz 2019 big 1 Marcos - yws 2031 3 Fran - xxz 2431 big 4 Matt - yre 1985 + str.endswith

np.logical_or.reduce

sub_str = ['xyz', 'xxz']
m = np.logical_or.reduce([df.name.str.endswith(s) for s in sub_str])

df['type'] = ''
df.loc[m, 'type'] = 'big'

Answer 2

以下是使用pandas.Series.str的一种方式。

df = pd.DataFrame([['Carlos - xyz', 2019], ['Marcos - yws', 2031],
                   ['Fran - xxz', 2431], ['Matt - yre', 1985]],
                  columns=['name', 'year'])

df['type'] = np.where(df['name'].str[-3:].isin({'xyz', 'xxz'}), 'big', '')

或者，您可以使用.loc访问者而不是numpy.where：

df['type'] = ''
df.loc[df['name'].str[-3:].isin({'xyz', 'xxz'}), 'type'] = 'big'

<强>结果

           name  year type
0  Carlos - xyz  2019  big
1  Marcos - yws  2031     
2    Fran - xxz  2431  big
3    Matt - yre  1985

<强>解释

使用pd.Series.str提取最后3个字母。
与O（1）复杂性查找的指定set值进行比较。
使用numpy.where为新系列执行条件分配。

在包含字符串的列中选择最后3个字符的行

2 个答案: