使用Pandas DataFrames搜索N ^ 2子字符串

时间:2019-05-11 06:54:19

标签: python pandas

我有两个包含字符串和子字符串的pandas DataFrame:

import pandas as pd
strings = pd.DataFrame(
    [
        {"id": 0, "string": "abcdef"},
        {"id": 1, "string": "bcdef"},
        {"id": 2, "string": "cdef"}
    ]
)

substrings = pd.DataFrame(
    [
        {"id": 0, "string": "a"},
        {"id": 1, "string": "bc"},
        {"id": 2, "string": "def"}
    ]
)

我想找到每个字符串中每个子字符串所有出现的索引。现在,我正在做类似的事情

substrings.apply(
    lambda substring: strings["string"].findall(substring.string),
    axis=1
)

是否有更好/更有效的方法来做到这一点?

1 个答案:

答案 0 :(得分:0)

我相信您需要:

s = strings["string"].str.findall('|'.join(substrings.string))
print (s)
0    [a, bc, def]
1       [bc, def]
2           [def]
Name: string, dtype: object