Question

我正在尝试在我的数据框（sdbfile）中创建一个系列，其值基于使用sdbfile dataframe中的元素的几个嵌套条件语句。系列reins_code填充了字符串值。

以下声明有效但我需要配置说'reins_code'是否以'R'开头而不是==特定的'R＃'

sdbfile['product'] = np.where(sdbfile.reins_code == 'R2', 'HiredPlant','Trad')

它不喜欢字符串函数startswith（）作为它的np.series？

有人可以帮忙吗？已通过文档，但无法看到对此问题的引用.......

Answer 1

使用pandas str属性。 http://pandas.pydata.org/pandas-docs/stable/text.html

系列和索引配备了一组字符串处理方法这使得在阵列的每个元素上操作变得容易。也许最重要的是，这些方法排除了缺失/ NA值自动。这些是通过str属性访问的具有与等效（标量）内置字符串方法匹配的名称：

sdbfile['product'] = np.where(sdbfile.reins_code.str[0] == 'R', 'HiredPlant','Trad')

Answer 2

使用向量化的str.startswith返回一个布尔掩码：

In [6]:
df = pd.DataFrame({'a':['R1asda','R2asdsa','foo']})
df

Out[6]:
         a
0   R1asda
1  R2asdsa
2      foo

In [8]:
df['a'].str.startswith('R2')

Out[8]:
0    False
1    True
2    False
Name: a, dtype: bool

In [9]:
df[df['a'].str.startswith('R2')]

Out[9]:
         a
1  R2asdsa

Numpy / Pandas系列从运营商开始？它存在吗？

2 个答案: