Question

我看了，但似乎是在干嘛回答以下问题。

我有一个与此类似的pandas数据框（称之为'df'）：

. \Send-MaiMessage

我想在数据帧中添加另一列（或生成一系列），其长度与dataframe（=相等的记录/行数）相同，如果Type包含字符串，则指定数字编码变量（1）绿色“，否则为（0）。

基本上，我正试图找到一种方法：

bool sortedAscending(const int*x, int n){
if (n == 0) return true;
if (x[n - 1] >= x[n - 2]) sortedAscending(x, n - 1);
else return false;
}

除了通常的numpy运算符（＆lt;，＆gt;，==，！=等），我需要一种说“in”或“contains”的方式。这可能吗？任何和所有帮助表示赞赏！

Answer 1

使用str.contains：

df['color'] = np.where(df['Type'].str.contains('Green'), 1, 0)
print (df)
        Type Set  color
1   theGreen   Z      1
2   andGreen   Z      1
3  yellowRed   X      0
4    roadRed   Y      0

apply的另一种解决方案：

df['color'] = np.where(df['Type'].apply(lambda x: 'Green' in x), 1, 0)
print (df)
        Type Set  color
1   theGreen   Z      1
2   andGreen   Z      1
3  yellowRed   X      0
4    roadRed   Y      0

第二种解决方案更快，但不能与NaN列中的Type一起使用，然后返回error：

TypeError：类型＆＃39; float＆＃39;的参数是不可迭代的

<强>计时：

#[400000 rows x 4 columns]
df = pd.concat([df]*100000).reset_index(drop=True)  

In [276]: %timeit df['color'] = np.where(df['Type'].apply(lambda x: 'Green' in x), 1, 0)
10 loops, best of 3: 94.1 ms per loop

In [277]: %timeit df['color1'] = np.where(df['Type'].str.contains('Green'), 1, 0)
1 loop, best of 3: 256 ms per loop

使用pandas dataframe中的文本字符串数据进行条件数据选择

1 个答案: