我正在使用pandas处理文件路径列表,我需要提取文件夹路径。
所以来自:
/volume1/SYN/FOLDER1/FILE.TXT
/volume1/SYN/FOLDER2/SUBFOLDER/FILE.PDF
我需要获取文件夹路径:
/volume1/SYN/FOLDER1/
/volume1/SYN/FOLDER2/SUBFOLDER/
我找到了一种获取文件名而不是文件夹路径的方法
data['index'] = data['File'].str.split('/').str[-1]
有什么想法吗?
答案 0 :(得分:1)
您可以使用os.path.splitext
并通过列表推导或带有自定义功能的apply
提取第一个拆分。
from os.path import splitext
df = pd.DataFrame({'filepaths': [r'/volume1/SYN/FOLDER1/FILE.TXT',
r'/volume1/SYN/FOLDER2/SUBFOLDER/FILE.PDF']})
# list comprehension, more efficient
df['folder'] = [splitext(x)[0] for x in df['filepaths']]
# apply + lambda implicit loop
df['folder'] = df['filepaths'].apply(lambda x: splitext(x)[0])
print(df)
filepaths \
0 /volume1/SYN/FOLDER1/FILE.TXT
1 /volume1/SYN/FOLDER2/SUBFOLDER/FILE.PDF
folder
0 /volume1/SYN/FOLDER1/FILE
1 /volume1/SYN/FOLDER2/SUBFOLDER/FILE
答案 1 :(得分:0)
使用rsplit
:
data['index'] = data['File'].str.rsplit('/', n=1).str[0] + '/'
如果没有缺失的值和性能很重要:
data['index'] = [x.rsplit('/', 1)[0] + '/' for x in data['File']]
print (data)
File index
0 /volume1/SYN/FOLDER1/FILE.TXT /volume1/SYN/FOLDER1/
1 /volume1/SYN/FOLDER2/SUBFOLDER/FILE.PDF /volume1/SYN/FOLDER2/SUBFOLDER/
答案 2 :(得分:0)
pandas-path
库将 pathlib 包装为任何 Pandas 系列或索引上的 .path
访问器。该功能使这种情况变得非常简单:
import pandas as pd
from pandas_path import path
files = pd.Series([
'/volume1/SYN/FOLDER1/FILE.TXT',
'/volume1/SYN/FOLDER2/SUBFOLDER/FILE.PDF',
])
# .path accessor created by importing pandas_path
files.path.parent
#> 0 /volume1/SYN/FOLDER1
#> 1 /volume1/SYN/FOLDER2/SUBFOLDER
#> dtype: object
创建于 2021-03-06 22:21:15 PST,由 reprexlite v0.4.2