Question

我需要在提到的数据框中找到“ 5.1 *”字符串的出现

df

0                    [14.0*, 13.7*, 13.3*, 9.3*, 5.1*]
1       [14.0*, 13.7*, 13.1*, 11.1*, 9.1*, 5.1*, 3.3*]
2             [14.0*, 13.7*, 13.3*, 11.1*, 9.3*, 5.1*]
3              [14.0*, 13.7*, 13.3*, 9.3*, 9.1*, 3.2*]

expected result                                               c
    0                    [14.0*, 13.7*, 13.3*, 9.3*, 5.1*]    1
    1       [14.0*, 13.7*, 13.1*, 11.1*, 9.1*, 5.1*, 3.3*]    1
    2             [14.0*, 13.7*, 13.3*, 11.1*, 9.3*, 5.1*]    1
    3              [14.0*, 13.7*, 13.3*, 9.3*, 9.1*, 3.2*]    0

我尝试使用

len(df['raw'].str.findall(r'[^[]*\[([^]]*)\]'))

但这为我提供了完整数据帧的长度

Answer 1

如果值是字符串，请在str.findall中使用str.len，并在必要时添加单词边界r'\b5.1*\b'：

print (type(df.loc[0, 'raw']))
<class 'str'>

df['c1'] = df['raw'].str.findall(r'5.1*').str.len()
df['c2'] = df['raw'].str.findall(r'\b5.1*\b').str.len()
print (df)
                                              raw  c1  c2
0              [15.1*, 715.1*, 13.3*, 9.3*, 5.1*]   3   1 <-changed first 2 values
1  [14.0*, 13.7*, 13.1*, 11.1*, 9.1*, 5.1*, 3.3*]   1   1
2        [14.0*, 13.7*, 13.3*, 11.1*, 9.3*, 5.1*]   1   1
3         [14.0*, 13.7*, 13.3*, 9.3*, 9.1*, 3.2*]   0   0

如果值是列表，请使用列表理解：

print (type(df.loc[0, 'raw']))
<class 'list'>

df['c'] = df['raw'].apply(lambda x: len([y for y in x if y == '5.1*']))

或者：

df['c'] = [len([y for y in x if y == '5.1*']) for x in df['raw']]

print (df)
                                              raw  c
0              [15.1*, 715.1*, 13.3*, 9.3*, 5.1*]  1
1  [14.0*, 13.7*, 13.1*, 11.1*, 9.1*, 5.1*, 3.3*]  1
2        [14.0*, 13.7*, 13.3*, 11.1*, 9.3*, 5.1*]  1
3         [14.0*, 13.7*, 13.3*, 9.3*, 9.1*, 3.2*]  0

列表类型数据框中字符的出现

1 个答案: