Python +正则表达式:如何在Python中的两个下划线之间提取值?

时间:2018-11-16 12:47:24

标签: python regex

我正在尝试提取两个下划线之间的值。为此,我编写了以下代码:

patient_ids = []
for file in files:
    print(file)
    patient_id = re.findall("_(.*?)_", file)
    patient_ids.append(patient_id)

print(patient_ids) 

输出:

PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]

因此,提取的值采用以下形式:['121']。我希望它们采用以下形式:121,即两个下划线内的数字。

我应该对我的代码进行哪些更改?

4 个答案:

答案 0 :(得分:1)

只需将for循环的最后一行替换为:

patient_ids.extend(int(patient_id))

extend将使结果变平,并且int(Patient_id)会将字符串转换为int

答案 1 :(得分:1)

您需要弄平结果,例如like that

 patient_ids = [item for sublist in patient_ids for item in sublist]
 print flat_list
 # => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']

答案 2 :(得分:1)

您有一个findall结果列表(似乎每个文件只有1个结果)-您可以将字符串转换为整数,也可以将结果展平:

patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
#                       ^^^^^ ^^^^^^  modified to have 2 ids for demo-purposes


# if you want to keep the boxing
numms   = [ list(map(int,m)) for m in patient_ids]  

# converted and flattened
numms2  = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]  


print(numms) 

print(numms2) 

输出:

# this keeps the findall results together in inner lists
[[112], [114, 4711], [115], [116], [117], [118], [119]]

# this flattens all results
[112, 114, 4711, 115, 116, 117, 118, 119]

Doku:

答案 3 :(得分:1)

真的,一种简单的方法是,不将 list 附加到另一个 list ,只需将该列表等效即可:

patient_ids = []
for file in files:
    print(file)
    patient_ids.extend(re.findall("_(.*?)_", file))

print(patient_ids)