我有一个像这样的数据框:
data = {'fce1_1': ['K701', 'Molly', 'Tina', 'K876', 'Amy'],
'fce1_2': ['K712', 'Molly', 'K709', 'Jape', 'Amy'],
'fce2_1': ['K703', 'K719', 'Tina', 'I841', 'K987'],
'fce2_2': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data)
df
fce1_1 fce1_2 fce2_1 fce2_2
K701 K712 K703 25
Molly Molly K719 94
Tina K709 Tina 57
...etc
我想在df的每一行搜索以' K'开头的任何值。并返回' K ***'的值最接近数据框右侧的列。例如:
fce1_1 fce1_2 fce2_1 fce2_2 new_col
K701 K712 K703 25 K703
Molly Molly K719 94 K719
Tina K709 Tina 57 K709
...etc
感谢。
答案 0 :(得分:3)
您可以{d}行{f} {} lambda
In [35]:
df['new_col'] = df.astype(str).apply(lambda x: x[x[x.str.startswith('K')].last_valid_index()], axis=1)
df
Out[35]:
fce1_1 fce1_2 fce2_1 fce2_2 new_col
0 K701 K712 K703 25 K703
1 Molly Molly K719 94 K719
2 Tina K709 Tina 57 K709
3 K876 Jape I841 62 K876
4 Amy Amy K987 70 K987
来检查第一个字符apply
' K'并返回以行为基础索引该列的startswith
:
In [38]:
df.astype(str).apply(lambda x: x.str.startswith('K'), axis=1)
Out[38]:
fce1_1 fce1_2 fce2_1 fce2_2
0 True True True False
1 False False True False
2 False True False False
3 True False False False
4 False False True False
In [39]:
df.astype(str).apply(lambda x: x[x.str.startswith('K')].last_valid_index(), axis=1)
Out[39]:
0 fce2_1
1 fce2_1
2 fce1_2
3 fce1_1
4 fce2_1
dtype: object
以上细分:
lambda
修改强>
要逐行处理匹配,我们可以在In [67]:
data = {'fce1_1': [np.NaN, 'Molly', 'Tina', 'K876', 'Amy'],
'fce1_2': [np.NaN, 'Molly', 'K709', 'Jape', 'Amy'],
'fce2_1': [np.NaN, 'K719', 'Tina', 'I841', 'K987'],
'fce2_2': np.NaN}
df = pd.DataFrame(data)
df['new_col'] = df.astype(str).apply(lambda x: x[x.str.startswith('K')].last_valid_index() if x.str.startswith('K').any() else 'No Match', axis=1)
df
Out[67]:
fce1_1 fce1_2 fce2_1 fce2_2 new_col
0 NaN NaN NaN NaN No Match
1 Molly Molly K719 NaN fce2_1
2 Tina K709 Tina NaN fce1_2
3 K876 Jape I841 NaN fce1_1
4 Amy Amy K987 NaN fce2_1
:
SELECT `INSPECTOR`.`name`
FROM `INSPECTION` , `INSPECTOR`
WHERE `inspectDate` = '2013-04-15'