我有一个像这样的数据框:
soundIn response_rater0 response_rater1 response_rater2
1 audios/VP10_S07_w.wav 2.0 2.0 1.0
2 audios/VP11_S08_w.wav 1.0 2.0 2.0
3 audios/VP01_S11_w.wav 1.0 1.0 2.0
4 audios/VP10_S11_i.wav 2.0 2.0 2.0
...
,我想仅在“ soundIn”列中包含“ VP01”的行创建另一个数据框。我尝试使用loc来执行此操作,但是它不接受VP01只是我要搜索的项的子字符串。
rslt_df = all_responses.loc['*VP01*' in all_responses['soundIn']]
有人有主意吗?
答案 0 :(得分:1)
尝试一下。使用pandas.Series.str.contains方法进行比较,并将结果作为索引传递到all_responses.loc[]
中。
all_responses.loc[all_responses['soundIn'].str.contains('VP01')]
import pandas as pd
from io import StringIO
s = """
soundIn response_rater0 response_rater1 response_rater2
1 audios/VP10_S07_w.wav 2.0 2.0 1.0
2 audios/VP11_S08_w.wav 1.0 2.0 2.0
3 audios/VP01_S11_w.wav 1.0 1.0 2.0
4 audios/VP10_S11_i.wav 2.0 2.0 2.0
"""
# read data into a dataframe from the string representation
df = pd.read_csv(StringIO(s), sep='\s+')
# Match serach condition and produce result
result = df.loc[df['soundIn'].str.contains('VP01')]
print(result)
输出:
soundIn response_rater0 response_rater1 response_rater2
3 audios/VP01_S11_w.wav 1.0 1.0 2.0