在R中使用正则表达式

时间:2016-04-12 21:20:25

标签: r

您好我正在尝试从R

中的段落中提取单个句子
"[report_beginning]

101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center

_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.

ALLERGIES:  none

SOCIAL HISTORY:  The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.

PHYSICAL EXAMINATION:  Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air.  General:  This is a patient in severe distress.  

 EMERGENCY DEPARTMENT COURSE:  I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.

 DISPOSITION:  The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. 

所以这是一个细胞。我有一个像这样的数据列,我想提取一行。 “体检:室内空气血压125/98,脉搏55,呼吸频率7,温度98.7和O2饱和度98。”

如何在R中使用正则表达式执行此操作?

我一直在使用以下代码,但它不起作用。它给了我一个空的数据集

x=grep("Blood pressure .+ air. ", ed_dia, value = TRUE)

1 个答案:

答案 0 :(得分:1)

我假设"[report begiinning实际上不在数据文件中,因此打开文本连接以读取文件应该会成功:

txt <- "101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center

_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.

ALLERGIES: Â none

SOCIAL HISTORY: Â The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.

PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â 

 EMERGENCY DEPARTMENT COURSE: Â I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.

 DISPOSITION: Â The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. "

inp <- readLines( textConnection(txt))

因此,在输入数据之后,只能使用grep来识别"PHYSICAL EXAMINATION"的行(我不确定空间是否需要特殊的正则表达式处理)然后使用"["从多行中提取:

inp[ grep("PHYSICAL[ ]EXAMINATION", inp)]
#[1] "PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â "