我使用police_officer <- str_extract_all(txtparts, "ID:.*\n")
从文本文件中提取参与911呼叫的警察的所有姓名。
例如:
2237 DISTURBANCE Report taken
Call Taker: Telephone Operators Sharon L Moran
Location/Address: [BRO 6949] 61 WILSON ST
ID: Patrolman Darvin Anderson
Disp-22:43:39 Arvd-22:48:57 Clrd-23:49:45
ID: Patrolman Stephen T Pina
Disp-22:43:48 Clrd-22:46:10
ID: Sergeant Michael V Damiano
Disp-22:46:33 Arvd-22:47:14 Clrd-22:55:22
在某些部分匹配多个ID:
时,我得到:"c(\" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\")"
。
以下是我迄今为止尝试清理数据的方法:
police_officer <- str_replace_all(police_officer,"c\\(.","")
police_officer <- str_replace_all(police_officer,"\\)","")
police_officer <- str_replace_all(police_officer,"ID:","")
police_officer <- str_replace_all(police_officer,"\\n\","") # I can't get rid of\\n\.
这就是我最终的结果
" Patrolman Darvin Anderson\\n\", \" Patrolman Stephen T Pina\\n\", \" Sergeant Michael V Damiano\\n\""
我需要帮助清理\\n\
。
答案 0 :(得分:1)
您可以将以下正则表达式与str_match_all
:
\bID:\s*(\w+(?:\h+\w+)*)
请参阅regex demo
> txt <- "Call Taker: Telephone Operators Sharon L Moran\n Location/Address: [BRO 6949] 61 WILSON ST\n ID: Patrolman Darvin Anderson\n Disp-22:43:39 Arvd-22:48:57 Clrd-23:49:45\n ID: Patrolman Stephen T Pina\n Disp-22:43:48 Clrd-22:46:10\n ID: Sergeant Michael V Damiano\n Disp-22:46:33 Arvd-22:47:14 Clrd-22:55:22"
> str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)")
[[1]]
[,1] [,2]
[1,] "ID: Patrolman Darvin Anderson" "Patrolman Darvin Anderson"
[2,] "ID: Patrolman Stephen T Pina" "Patrolman Stephen T Pina"
[3,] "ID: Sergeant Michael V Damiano" "Sergeant Michael V Damiano"
正则表达式将ID:
作为整个单词匹配,然后匹配零个或多个空格(使用\s*
),然后捕获字母数字字符序列(可选地用水平空格分隔)。 str_match_all
有助于提取捕获的部分,因此,您无法将str_extract_all
与此正则表达式一起使用。
<强>更新强>
> time <- str_trim(str_extract(txt, " [[:digit:]]{4}"))
> Call_taker <- str_replace_all(str_extract(txt, "Call Taker:.*\n"),"Call Taker:","" ) %>% str_replace_all("\n","")
> address <- str_extract(txt, "Location/Address:.*\n")
> Police_officer <- str_match_all(txt, "\\bID:\\s*(\\w+(?:\\h+\\w+)*)")
> BPD_log <- cbind(time,Call_taker,address,list(Police_officer[[1]][,2]))
> BPD_log <- as.data.frame(BPD_log)
> colnames(BPD_log) <- c("time", "Call_taker", "address", "Police_officer")
> BPD_log
time Call_taker address
1 6949 Telephone Operators Sharon L Moran Location/Address: [BRO 6949] 61 WILSON ST\n
Police_officer
1 Patrolman Darvin Anderson, Patrolman Stephen T Pina, Sergeant Michael V Damiano
>