python-从文件中提取某些文本数据的正则表达式

时间:2019-08-07 09:58:03

标签: python

我有一个文本文件,该文件已从pdf转换为文本数据。我想从文本数据中提取出现的描述,后跟字符串“ FIGURE”。以下是一些文本数据示例行,

  

图1-1。剂量设计的经验方法   养生。监测预期的和不利的影响,之后   药物剂量方案的管理,用于进一步   通过反馈(虚线)优化和优化方案。

     

Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 19/5/25 11:07   19/5/25/19下午11:07

     

第1章•治疗相关性5

     

查看这两个子学科的另一种方法是   药代动力学处理人体对药物的作用   (吸收,分布,代谢,排泄),而   药效学描述了药物对身体的作用(两者   期望和不期望的效果)。根据这一定义,可以   错误地认为这些是相反的学科,而在   现实中,他们携手并进。图1-3显示   药代动力学处理浓度-时间关系,而   药效学描述药物之间的关系   集中,以及良好(预期)和不良(不良)影响。每   这两个拼图中的一个本身不足以指导治疗   并优化剂量;仅当药代动力学和药效学   是相互关联的(PK / PD)并整合在一起   有用。这种集成通常是通过开发   捕获观察到的数学模型(PK / PD模型)   关系并允许预测和确定最优   给药方案。

     

图1-2。设计的合理方法   剂量方案。该药的药代动力学和药代动力学   首先定义药物。然后,对药物的反应,再加上   药代动力学信息,用作反馈(虚线)   修改剂量方案以达到最佳治疗效果。对于一些   药物,体内形成的活性代谢产物也可能需要   考虑在内。

我已将pdf文件读入文本,并尝试使用一些正则表达式组合对文本数据应用re.search。但没有运气。

# Get files text content
text = file_data['content']
#print(text)
text1 = re.search('FIGURE[ ]*[0-9]-[0-9]. (.*)',text,re.MULTILINE)

1 个答案:

答案 0 :(得分:1)

text1 = re.findall('FIGURE\s*[0-9]+-[0-9]+. (.*)',text,re.MULTILINE)
>>> import re
>>> t="""FIGURE 1-1. An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).
...
... Derendorf5e_CH01.indd 4Derendorf5e_CH01.indd 4 5/25/19 11:07 PM5/25/19 11:07 PM
...
... CHAPTER 1 • Therapeutic Relevance 5
...
... Another way of looking at these two subdisciplines is that pharmacokinetics deals with what the body does to the drug (absorption, distribution, metabolism, excretion), whereas pharmacodynamics describes what the drug does to the body (both desired and undesired effects). From this definition, one could wrongly conclude that these are opposite disci- plines, whereas in reality, they go hand-in-hand. Figure 1-3 shows that pharmacokinetics deals with concentration–time relationships, whereas pharmacodynamics describes the relationship between drug concentration and both good (desired) and bad (adverse) effects. Each of these two puzzle pieces by itself is insufficient to guide therapy and optimize dosing; only when pharmacokinetics and pharmacodynamics are linked (PK/PD) and integrated do they become therapeutically useful. This integration is commonly achieved by developing mathematical models (PK/PD models) that capture the observed relationships and allow prediction and identification of optimum dosing regimens.
...
... FIGURE 1-2. A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account."""
>>> re.findall('FIGURE\s*[0-9]-[0-9]. (.*)',t,re.MULTILINE)
['An empirical approach to the design of a dosage regimen. The effects, both desired and adverse, are monitored after the administration of a dosage regimen of a drug and used to further refine and optimize the regimen through feedback ( dashed line ).', 'A rational approach to the design of a dosage regimen. The pharmacokinetics and pharmacodynam- ics of the drug are first defined. Then, responses to the drug, coupled with pharmacokinetic information, are used as feedback ( dashed lines ) to modify the dosage regimen to achieve optimal ther- apy. For some drugs, active metabolites formed in the body may also need to be taken into account.']`