从字符串中解析日期

时间:2018-02-15 06:37:26

标签: string python-3.x date-parsing

我有像python这样的字符串列表

['AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5',
 'AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5']

我想从这些字符串中仅解析日期和时间(例如,2016-08-05 15:10:00)。

到目前为止,我使用了类似下面的for循环,但它非常耗时,有更好的方法吗?

for files in glob.glob("AM_B0_*.flac.h5"):
    if files[11]=='_':
        year=files[12:16]
        month=files[17:19]
        day= files[20:22]
        hour=files[23:25]
        minute=files[25:27]
        second=files[27:29]
        tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year),int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S') 

    else:
        year=files[11:15]
        month=files[16:18]
        day= files[19:21]
        hour=files[22:24]
        minute=files[24:26]
        second=files[26:28]
        tindex=pd.date_range(start= '%d-%02d-%02d %02d:%02d:%02d' %(int(year), int(month), int(day), int(hour), int(minute), int(second)), periods=60, freq='10S')

2 个答案:

答案 0 :(得分:0)

代替使用文件[11]作为硬编码去找_的最后一个或第二个最后一个索引然后使用你的代码然后你不必写相同的代码2次。或者使用正则表达式来解析字符串。

答案 1 :(得分:0)

试试这个(根据最后的第二个' - ',不需要if-else的情况):

filesall = ['AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5',
 'AM_B0_D0.0_2016-04-01T010000.flac.h5',
 'AM_B0_D3.7_2016-04-13T215000.flac.h5',
 'AM_B0_D10.3_2017-03-17T110000.flac.h5',
 'AM_B0_D0.7_2016-10-21T104000.flac.h5',
 'AM_B0_D4.4_2016-08-05T151000.flac.h5']

def find_second_last(text, pattern):
    return text.rfind(pattern, 0, text.rfind(pattern))

for files in filesall:
    start = find_second_last(files,'-') - 4 # from yyyy- part
    timepart = (files[start:start+17]).replace("T"," ")
    #insert 2 ':'s
    timepart = timepart[:13] + ':' + timepart[13:15] + ':' +timepart[15:]
    # print(timepart)
    tindex=pd.date_range(start= timepart, periods=60, freq='10S')