提取内容中的特定文本

时间:2015-04-02 08:16:52

标签: regex r

我正在处理电子邮件内容,这些内容在行中被转储为数据帧值,并且必须根据以下内容将其分开:从直到下一个:

示例数据:

"______________________________________________ 
From:   Kumar M,  
Sent:   Tuesday, 21 October 2014 7:30 AM
To: Deo, Ravinesh; G S, Venkatesh;
Cc: Monteleone, Elif; Kabyanga, Isaac
Subject:    FW: Please Approve the Qlik  Access.


Hi Ravi,

We will work on the providing David access to Ql and an email will be sent out once the access is set up.   

Regards,
Santhosh

______________________________________________ 
From:   Deo, Ravinesh  
Sent:   Tuesday, 21 October 2014 7:20 AM
To: Kabyanga, Isaac; Kumar M, Santhosh
Cc: Monteleone, Elif
Subject:    FW: Please Approve the Qlikview Access.

Hi Isaac/Santhosh,

Appreciate if you can grant access to David Dennis for GPA – Timor.

David is CEO Timor Leste.

Thanks
Ravi

_____________________________________________
From: Dennis, David (Timor) 
Sent: Tuesday, 21 October 2014 11:34 AM
To: Deo, Ravinesh
Subject: FW: Please Approve the Q GPA Access.

Here you go - appreciate your help Rgds

______________________________________________ 
From:   Dennis, David (Timor)  
Sent:   Thursday, 9 October 2014 11:33 AM
To: Buchanan, Geoffrey (Solomon Islands)
Subject:    Please Approve the Qlikview Access.

Hello,

Can you please review the attached form and click ' Manager Approval' to approve.

Thanks"

我已提到here,我使用了以下代码

ex <- gsub("^[from:](.*?)[from:]$", "",impordata$Problem.Description[i] )

但这会给所有邮件特别排!

期望输出:

1

From:   Kumar M,  
    Sent:   Tuesday, 21 October 2014 7:30 AM
    To: Deo, Ravinesh; G S, Venkatesh;
    Cc: Monteleone, Elif; Kabyanga, Isaac
    Subject:    FW: Please Approve the Qlik  Access.


    Hi Ravi,

    We will work on the providing David access to Ql and an email will be sent out once the access is set up.   

    Regards,
    Santhosh

[2]

From:   Deo, Ravinesh  
    Sent:   Tuesday, 21 October 2014 7:20 AM
    To: Kabyanga, Isaac; Kumar M, Santhosh
    Cc: Monteleone, Elif
    Subject:    FW: Please Approve the Qlikview Access.

    Hi Isaac/Santhosh,

    Appreciate if you can grant access to David Dennis for GPA – Timor.

    David is CEO Timor Leste.

    Thanks
    Ravi

[3]

 From: Dennis, David (Timor) 
    Sent: Tuesday, 21 October 2014 11:34 AM
    To: Deo, Ravinesh
    Subject: FW: Please Approve the Q GPA Access.

    Here you go - appreciate your help Rgds

[4]

From:   Dennis, David (Timor)  
    Sent:   Thursday, 9 October 2014 11:33 AM
    To: Buchanan, Geoffrey (Solomon Islands)
    Subject:    Please Approve the Qlikview Access.

    Hello,

    Can you please review the attached form and click ' Manager Approval' to approve.

    Thanks"

并使用regmatches

#Converted a row as vector to apply regmatches
vec <- as.vector(impordata$Problem.Description[1])

matc <-regmatches(vec, gregexpr("(^[from:]).*?($[from:])", vec, perl = TRUE))

也没有使用它,

有人可以纠正它!或提供一些帮助

2 个答案:

答案 0 :(得分:0)

您可以使用strsplit功能。

> strsplit(gsub("(?s)^_+\\s+", "", x, perl=T) , "_+\\s*(?=From:)", perl=T)[[1]]
[1] "From:   Kumar M,  \nSent:   Tuesday, 21 October 2014 7:30 AM\nTo: Deo, Ravinesh; G S, Venkatesh;\nCc: Monteleone, Elif; Kabyanga, Isaac\nSubject:    FW: Please Approve the Qlik  Access.\n\n\nHi Ravi,\n\nWe will work on the providing David access to Ql and an email will be sent out once the access is set up.   \n\nRegards,\nSanthosh\n\n"
[2] "From:   Deo, Ravinesh  \nSent:   Tuesday, 21 October 2014 7:20 AM\nTo: Kabyanga, Isaac; Kumar M, Santhosh\nCc: Monteleone, Elif\nSubject:    FW: Please Approve the Qlikview Access.\n\nHi Isaac/Santhosh,\n\nAppreciate if you can grant access to David Dennis for GPA – Timor.\n\nDavid is CEO Timor Leste.\n\nThanks\nRavi\n\n"               
[3] "From: Dennis, David (Timor) \nSent: Tuesday, 21 October 2014 11:34 AM\nTo: Deo, Ravinesh\nSubject: FW: Please Approve the Q GPA Access.\n\nHere you go - appreciate your help Rgds\n\n"                                                                                                                                                           
[4] "From:   Dennis, David (Timor)  \nSent:   Thursday, 9 October 2014 11:33 AM\nTo: Buchanan, Geoffrey (Solomon Islands)\nSubject:    Please Approve the Qlikview Access.\n\nHello,\n\nCan you please review the attached form and click ' Manager Approval' to approve.\n\nThanks"

答案 1 :(得分:0)

看看strsplit

splits <- strsplit(paste0(vec, collapse = "\n"), "_{45}")[[1]][-1]
cat(splits) 
cat(splits[1])
# _ 
# From:   Kumar M,  
# Sent:   Tuesday, 21 October 2014 7:30 AM
# To: Deo, Ravinesh; G S, Venkatesh;
# Cc: Monteleone, Elif; Kabyanga, Isaac
# Subject:    FW: Please Approve the Qlik  Access.
# 
# 
# Hi Ravi,
# 
# We will work on the providing David access to Ql and an email will be sent out once the access is set up.   
# 
# Regards,
# Santhosh