我在pandas数据框中有一些字符串。我想在相邻列中搜索该字符串的存在。
在下面的例子中,我想搜索'choice'系列中的字符串是否包含在'fruit'系列中,在新列'choice_match'中返回true(1)或false(0)。< / p>
示例DataFrame:
import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple', 'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange', 'orange', 'orange', 'banana', 'banana']}
df = pd.DataFrame(data=d)
Desired DataFrame:
import pandas as pd
d = {'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'fruit': [
'apple, banana', 'apple', 'apple', 'pineapple', 'apple, pineapple', 'orange', 'apple, orange', 'orange', 'banana', 'apple, peach'],
'choice': ['orange', 'orange', 'apple', 'pineapple', 'apple', 'orange', 'orange', 'orange', 'banana', 'banana'],
'choice_match': [0, 0, 1, 1, 1, 1, 1, 1, 1, 0]}
df = pd.DataFrame(data=d)
答案 0 :(得分:5)
In [75]: df['choice_match'] = (df['fruit']
.str.split(',\s*', expand=True)
.eq(df['choice'], axis=0)
.any(1).astype(np.int8))
In [76]: df
Out[76]:
ID choice fruit choice_match
0 1 orange apple, banana 0
1 2 orange apple 0
2 3 apple apple 1
3 4 pineapple pineapple 1
4 5 apple apple, pineapple 1
5 6 orange orange 1
6 7 orange apple, orange 1
7 8 orange orange 1
8 9 banana banana 1
9 10 banana apple, peach 0
一步一步:
In [78]: df['fruit'].str.split(',\s*', expand=True)
Out[78]:
0 1
0 apple banana
1 apple None
2 apple None
3 pineapple None
4 apple pineapple
5 orange None
6 apple orange
7 orange None
8 banana None
9 apple peach
In [79]: df['fruit'].str.split(',\s*', expand=True).eq(df['choice'], axis=0)
Out[79]:
0 1
0 False False
1 False False
2 True False
3 True False
4 True False
5 True False
6 False True
7 True False
8 True False
9 False False
In [80]: df['fruit'].str.split(',\s*', expand=True).eq(df['choice'], axis=0).any(1)
Out[80]:
0 False
1 False
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 False
dtype: bool
In [81]: df['fruit'].str.split(',\s*', expand=True).eq(df['choice'], axis=0).any(1).astype(np.int8)
Out[81]:
0 0
1 0
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 0
dtype: int8
答案 1 :(得分:5)
这是一种方式:
df['choice_match'] = df.apply(lambda row: row['choice'] in row['fruit'].split(','),\
axis=1).astype(int)
<强>解释强>
df.apply
axis=1
遍历每一行并应用逻辑;它接受匿名的lambda
函数。row['fruit'].split(',')
从fruit
列创建一个列表。这是必要的,例如,apple
中未考虑pineapple
。astype(int)
是将布尔值转换为整数以进行显示的必要条件。答案 2 :(得分:4)
选项1
使用Numpy的find
如果find
找不到该值,则会返回-1
from numpy.core.defchararray import find
choice = df.choice.values.astype(str)
fruit = df.fruit.values.astype(str)
df.assign(choice_match=(find(fruit, choice) > -1).astype(np.uint))
ID choice fruit choice_match
0 1 orange apple, banana 0
1 2 orange apple 0
2 3 apple apple 1
3 4 pineapple pineapple 1
4 5 apple apple, pineapple 1
5 6 orange orange 1
6 7 orange apple, orange 1
7 8 orange orange 1
8 9 banana banana 1
9 10 banana apple, peach 0
选项2
设置逻辑
使用set
s <
是严格的子集,<=
是子集。让自己pd.Series
set
<=
,并使用choice = df.choice.apply(lambda x: set([x]))
fruit = df.fruit.str.split(', ').apply(set)
df.assign(choice_match=(choice <= fruit).astype(np.uint))
ID choice fruit choice_match
0 1 orange apple, banana 0
1 2 orange apple 0
2 3 apple apple 1
3 4 pineapple pineapple 1
4 5 apple apple, pineapple 1
5 6 orange orange 1
6 7 orange apple, orange 1
7 8 orange orange 1
8 9 banana banana 1
9 10 banana apple, peach 0
来确定一列的集合是否是其他列集的子集。
get_dummies
选项3
灵感来自@Wen's answer
使用max
和c = pd.get_dummies(df.choice)
f = df.fruit.str.get_dummies(', ')
df.assign(choice_match=pd.DataFrame.mul(*c.align(f, 'inner')).max(1))
ID choice fruit choice_match
0 1 orange apple, banana 0
1 2 orange apple 0
2 3 apple apple 1
3 4 pineapple pineapple 1
4 5 apple apple, pineapple 1
5 6 orange orange 1
6 7 orange apple, orange 1
7 8 orange orange 1
8 9 banana banana 1
9 10 banana apple, peach 0
{{1}}
答案 3 :(得分:3)
嗯找到一种有趣的方式import React, { Component } from 'react';
const date = new Date();
const time = date.getHours();
const backgroundImages = [
'http://via.placeholder.com/350x150',
'http://via.placeholder.com/350x300',
'http://via.placeholder.com/150x150',
'http://via.placeholder.com/350x150',
'http://via.placeholder.com/350x200',
'http://via.placeholder.com/450x150',
'http://via.placeholder.com/350x450',
'http://via.placeholder.com/750x300',
'http://via.placeholder.com/150x850',
'http://via.placeholder.com/350x150',
'http://via.placeholder.com/350x300',
'http://via.placeholder.com/150x150',
'http://via.placeholder.com/350x150',
'http://via.placeholder.com/350x200',
'http://via.placeholder.com/450x150',
'http://via.placeholder.com/350x450',
'http://via.placeholder.com/750x300',
'http://via.placeholder.com/150x850',
]
class App extends Component {
constructor(props){
super(props);
this.state = {
image_src: ''
}
}
componentWillMount() {
this.setState({image_src:backgroundImages[time] })
}
render() {
return (
<div>
<img src={this.state.image_src} alt=""/>
<p>{time}</p>
</div>
);
}
}
export default App;
get_dummies
分配后
(df.fruit.str.replace(' ','').str.get_dummies(',')+df.choice.str.get_dummies()).gt(1).any(1)
Out[726]:
0 False
1 False
2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 False
dtype: bool