我有一个以下格式的文本文件,我必须提取所有范围的运动和位置值。在某些文件中,该值在下一行给出,而在某些文件中,则不给出
File1.txt:
Functional Assessment: Patient currently displays the following functional
limitations and would benefit from treatment to maximize functional use and
pain reduction: Range of Motion: limited . ADLs: limited . Gait: limited .
Stairs: limited . Squatting: limited . Work participation status: limited .
Current Status: The patient's current status is improving.
Location: Right side
预期输出:limited
| Right side
File2.txt:
Functional Assessment: Patient currently displays the following functional
limitations and would benefit from treatment to maximize functional use and
pain reduction:
Range of Motion:
painful
and
limited
Strength:
limited
预期输出:painful and limited
|没有给出
这是我正在尝试的代码:
if "Functional Assessment:" in line:
result=str(line.rsplit('Functional Assessment:'))
romvalue = result.rsplit('Range of Motion:')[-1].split()[0]
outputfile.write(romvalue)
partofbody = result.rsplit('Location:')[-1].split()[0]
outputfile.write(partofbody)
此代码无法获得所需的输出。有人可以帮忙吗。
答案 0 :(得分:3)
您可以在以Functional Assessment:
开头的行之后收集所有行,并加入它们并使用以下正则表达式:
(?sm)\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)
请参见regex demo。
详细信息
(?sm)
-re.S
和re.M
修饰符\b
-单词边界(Location|Range of Motion)
-第1组:Location
或Range of Motion
:\s*
-一个冒号和0+个空格([^\W_].*?)
-第2组:\s*
-超过0个空格(?=(?:\.\s*)?[^\W\d_]+:|\Z)
-当前位置右侧的正向前瞻
(?:\.\s*)?
-.
和0+空格的可选序列[^\W\d_]+:
-超过1个字母,后跟:
|
-或\Z
-字符串的结尾。这里是Python demo:
reg = re.compile(r'\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)', re.S | re.M)
for file in files:
flag = False
tmp = ""
for line in file.splitlines():
if line.startswith("Functional Assessment:"):
tmp = tmp + line + "\n"
flag = not flag
elif flag:
tmp = tmp + line + "\n"
print(dict(list(reg.findall(tmp))))
输出(对于您发布的两个文本):
{'Location': 'Right side', 'Range of Motion': 'limited'}
{'Range of Motion': 'painful \nand\nlimited'}