Question

我有一个Dataframe，其中包含一列带有文件名列表的列。我想在Dataframe中找到所有行，它们的值都有来自一组已知前缀的前缀。

我知道我可以运行一个简单的for循环，但是我想在Dataframe中运行以检查速度并运行基准测试-这也是一个不错的练习。

我的初衷是将str.slice与str.index结合使用，但我无法正常工作。这就是我的想法：

import pandas as pd

file_prefixes = {...}
file_df = pd.Dataframe(list_of_file_names)

file_df.loc[file_df.file.str.slice(start=0, stop=upload_df.file.str.index('/')-1).isin(file_prefixes), :] # this doesn't work as the index returns a dataframe

我希望上述代码将返回该行中所有值以上面列表中的文件前缀开头的所有行。

总而言之，我想在两件事上寻求帮助：

结合slice和index
关于实现这一目标的更好方法的思考

谢谢

Answer 1

我将使用startswith

file_df.loc[file_df.file.str.startswith(tuple(file_prefixes)), :]

熊猫str切片与熊猫str索引结合

1 个答案: