如何从文件名中获取信息,并将其分为几部分以保存在列表中

时间:2020-06-17 12:56:06

标签: python python-3.x filenames glob h5py

我需要采用这段代码并将其转换为一个函数,该函数将从目录中.h5文件的文件名中获取信息。我对python非常陌生,因此希望我的解释有意义。下面是代码,下面是其需要解析的数据文件名的示例。

atl06_dir = 'ATL06 files'
filenames = glob.glob(atl06_dir + '/*h5')
year_selected = 2019
filenames_selected = list()
for filename in filenames:
   product, year, month, day, hour, minute, second, track, cycle, granule, release, version = icesat2_data_utils.h5FilenameParts(os.path.basename(filename)) 
#need to replace this line with a function that grabs from the filename. This one does not work
   if int(year) == year_selected: 
      filenames_selected.append(filename) 

如何获取本节以读取.h5文件的文件名,并根据您在示例文件名中看到的名称中的分隔符输出名称的不同部分:

ATL06_ [yyyymmdd] [hhmmss] [ttttccss] [vvv_rr] .h5

我觉得我可以要求它读取名称中的某些字符,例如:

# product ATL06 = 0 to 5
# year yyyy = indexes 8 to 12
# month mm = 12 to 14
# day dd = 14 to 16
# hour hh = 18 to 20
# minute mm = 20 to 22
# second ss = 22 to 24
# Reference ground track tttt = 27 to 31
# cycle cc = 31 to 33
# orbital segment ss = 33 to 35
# version vvv = 38 to 44

1 个答案:

答案 0 :(得分:0)

您可以根据自己的要求将其切成薄片,例如

Element Should Contain  xpath://div[@id='post-page']   ${term}

要提取string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5" product = string[0:5] 中的内容,可以使用[]模块:

regex

这将返回匹配字符串的列表:

import re
string ="ATL06_[yyyymmdd][hhmmss][ttttccss][vvv_rr].h5"
re.findall('\[(.*?)\]',string)

此字符串中的每个元素都是一个列表,可以根据需要将其单独切片