您好我正在尝试从R
中的段落中提取单个句子"[report_beginning]
101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center
_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.
ALLERGIES: Â none
SOCIAL HISTORY: Â The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.
PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â
EMERGENCY DEPARTMENT COURSE: Â I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.
DISPOSITION: Â The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication.
所以这是一个细胞。我有一个像这样的数据列,我想提取一行。 “体检:室内空气血压125/98,脉搏55,呼吸频率7,温度98.7和O2饱和度98。”
如何在R中使用正则表达式执行此操作?
我一直在使用以下代码,但它不起作用。它给了我一个空的数据集
x=grep("Blood pressure .+ air. ", ed_dia, value = TRUE)
答案 0 :(得分:1)
我假设"[report begiinning
实际上不在数据文件中,因此打开文本连接以读取文件应该会成功:
txt <- "101962493|2011-06-09|final|Omary, Lea, M.D.|43654754|Major Academic Center
_Ms.Wattley is a 88 year-old patient who comes in today with a chief complaint of PREG/SPOTTING.
ALLERGIES: Â none
SOCIAL HISTORY: Â The patient Ms.Wattley is a past smoker who has a visiting nurse. Patient is bed-bound.
PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â
EMERGENCY DEPARTMENT COURSE: Â I confirm that I have seen and evaluated the patient, reviewed the resident's documentation on the patient's chart. The following procedures were performed: Medication:medication given. Procedure:no procedures performed. Testing:testing conducted . Please review the chart for more details.
DISPOSITION: Â The patient was admitted to the hospital with a primary diagnosis of Threatened abortion, antepartum condition or complication. "
inp <- readLines( textConnection(txt))
因此,在输入数据之后,只能使用grep
来识别"PHYSICAL EXAMINATION"
的行(我不确定空间是否需要特殊的正则表达式处理)然后使用"["
从多行中提取:
inp[ grep("PHYSICAL[ ]EXAMINATION", inp)]
#[1] "PHYSICAL EXAMINATION: Â Blood pressure 125/98, pulse 55, respiratory rate 7, temperature 98.7, and O2 saturation 98 on room air. Â General: Â This is a patient in severe distress. Â "