根据子字符串创建子数据帧

时间:2020-05-09 10:50:48

标签: python pandas dataframe substring

我有一个像这样的数据框:

    soundIn                 response_rater0  response_rater1    response_rater2
1   audios/VP10_S07_w.wav   2.0              2.0                1.0 
2   audios/VP11_S08_w.wav   1.0              2.0                2.0 
3   audios/VP01_S11_w.wav   1.0              1.0                2.0 
4   audios/VP10_S11_i.wav   2.0              2.0                2.0 
...

,我想仅在“ soundIn”列中包含“ VP01”的行创建另一个数据框。我尝试使用loc来执行此操作,但是它不接受VP01只是我要搜索的项的子字符串。

rslt_df = all_responses.loc['*VP01*' in all_responses['soundIn']] 

有人有主意吗?

1 个答案:

答案 0 :(得分:1)

解决方案

尝试一下。使用pandas.Series.str.contains方法进行比较,并将结果作为索引传递到all_responses.loc[]中。

all_responses.loc[all_responses['soundIn'].str.contains('VP01')]

虚拟数据示例

import pandas as pd
from io import StringIO

s = """
    soundIn                 response_rater0  response_rater1    response_rater2
1   audios/VP10_S07_w.wav   2.0              2.0                1.0 
2   audios/VP11_S08_w.wav   1.0              2.0                2.0 
3   audios/VP01_S11_w.wav   1.0              1.0                2.0 
4   audios/VP10_S11_i.wav   2.0              2.0                2.0
"""

# read data into a dataframe from the string representation
df = pd.read_csv(StringIO(s), sep='\s+')
# Match serach condition and produce result
result = df.loc[df['soundIn'].str.contains('VP01')]
print(result)

输出

                 soundIn  response_rater0  response_rater1  response_rater2
3  audios/VP01_S11_w.wav              1.0              1.0              2.0