下面的输入是文本文件。以下是数据输入
From: abc@xyz.com
To: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
Subject: Introduction to R
B-CC: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
必需输出:
我需要将所有邮件ID转换为To和B-CC中的一个对象。挑战是所有的电子邮件ID都不在同一行中的不同行。需要将所有电子邮件ID复制到一个对象
To: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com
B-CC: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com
答案 0 :(得分:2)
你可以这样做:
library(stringr)
str1 <- paste(str_trim(lines), collapse=', ')
str_extract_all(str1, perl('(?=To: ).*(?=, Subject)'))[[1]]
#[1] "To: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com,
#lkj@xyz.com, jkl@xyz.com"
str_extract_all(str1, perl('(?=B-CC:).*'))[[1]]
#[1] "B-CC: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com,
#lkj@xyz.com, jkl@xyz.com"
或使用stringi
library(stringi)
stri_extract_all_regex(str1, '(?=To: ).*(?=, Subject)')[[1]]
#[1] "To: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com,
# lkj@xyz.com, jkl@xyz.com"
stri_extract_all_regex(str1, '(?=B-CC:).*')[[1]]
#[1] "B-CC: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com,
#lkj@xyz.com, jkl@xyz.com"
lines <- readLines(n=8)
From: abc@xyz.com
To: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
Subject: Introduction to R
B-CC: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
答案 1 :(得分:2)
与@ akrun相同,但几乎没有任何修改。
> library(stringr)
> lines <- readLines(n=8)
From: abc@xyz.com
To: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
Subject: Introduction to R
B-CC: qwe@xyz.com, ewq@xyz.com
tuu@xyz.com, vbn@xyz.com
lkj@xyz.com, jkl@xyz.com
> str1 <- paste(str_trim(lines), collapse=', ')
> str_extract_all(str1, perl('(?=To:\\s+).*?(?=,\\s+\\w+:|$)'))[[1]]
[1] "To: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com"
> str_extract_all(str1, perl('(?=B-CC:\\s+).*?(?=,\\s+\\w+:|$)'))[[1]]
[1] "B-CC: qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com"
答案 2 :(得分:1)
读入行并为每行没有带空格的冒号前缀。结果将采用DCF格式,因此我们可以使用read.dcf
读取它,用逗号和空格替换任何换行符。生成的结构将包含From
,To
,Subject
和B-CC
组件。
Lines <- readLines("myfile.txt")
hasColon <- grepl(":", Lines)
Lines[!hasColon] <- paste("", Lines[!hasColon])
email <- read.dcf(textConnection(Lines))[1, ]
email <- gsub("\n", ", ", email)
,并提供:
> email[['To']]
[1] "qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com"
> email[['B-CC']]
[1] "qwe@xyz.com, ewq@xyz.com, tuu@xyz.com, vbn@xyz.com, lkj@xyz.com, jkl@xyz.com"
答案 3 :(得分:0)
cat input | sed 's/: /\n/' | awk '/To/{flag=1;next}/Subject/{flag=0}flag' > to.txt
cat input | sed 's/: /\n/' | awk '/B-CC/{flag=1;next}/FINISH/{flag=0}flag' > bcc.txt
如果我理解你的问题,这对你有用。