在R中,如何在语句末尾(句号)而不是句子之间的。(点)处拆分文本/段落

时间:2018-08-13 04:26:09

标签: r regex text nlp delimiter-separated-values

在R中:- 例如:-

text_data<-" I have been a Gig subscriber for a decent amount of time.  When the service was originally installed I would observe 900+Mbps speeds.  I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption. No splitters on the line. It goes from Pole ---> Wall Jack --> Modem.There is always significant Uncorrected errors.  Attached are the Upstream and Downstream information and error logs.  These are 4 days after a modem reset."

> textdata<- as.String(text_data)

> a<-strsplit(text_data,".", fixed = TRUE)

输出:-

> a
[[1]]
 [1] " I have been a Gig subscriber for a decent amount of time"                                                                                                                           
 [2] "  When the service was originally installed I would observe 900+Mbps speeds"                                                                                                         
 [3] "  I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow"                                                               
 [4] "I consistently avg less than 286 Mbps when utilizing both speedtest"                                                                                                                 
 [5] "xfinity"                                                                                                                                                                             
 [6] "com and fast"                                                                                                                                                                        
 [7] "com as well as speedtest"                                                                                                                                                            
 [8] "net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site"                                                                        
 [9] "Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables"                                                                                  
[10] " I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset"                               
[11] " I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds"
[12] "*NOTE 2*I do not have a cable subscrption"                                                                                                                                           
[13] " No splitters on the line"                                                                                                                                                           
[14] " It goes from Pole ---> Wall Jack --> Modem"                                                                                                                                         
[15] "There is always significant Uncorrected errors"                                                                                                                                      
[16] "  Attached are the Upstream and Downstream information and error logs"                                                                                                               
[17] "  These are 4 days after a modem reset"   

R中所需的输出:- 文本应在语句结尾(句号)处分割,而不是在句子之间的。(点)处分割。)

1)I have been a Gig subscriber for a decent amount of time.  
When the service was originally installed I would observe 900+Mbps speeds.  
2)I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.
3)I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.
4)Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. 
5) I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. 
6)I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cables*NOTE*Wether I use the EA9500 or go directly to the PC I get the same slow speeds.
7)*NOTE 2*I do not have a cable subscrption. 
8)No splitters on the line. 
9)It goes from Pole ---> Wall Jack --> Modem.
10)There is always significant Uncorrected errors.  
11)Attached are the Upstream and Downstream information and error logs.  
12)These are 4 days after a modem reset.

请协助。

1 个答案:

答案 0 :(得分:1)

使用此特定数据的工作模式进行编辑;您可以在.上分割,后跟空格或大写字母,图案为\\.(?=( |[A-Z]))

您需要小心,因为您的句子后面没有正确的空格。这使得无法可靠地区分它们(请参见输出中的第三个分割句)。至少不会像第一次尝试那样在.com的情况下发生分裂。在这里,我们使用大写字母的区别来区分speedtest.xfinity.comSlow.Isite.Troubleshooting,但是如果有人忘记了空格并且忘记了大写下一个句子,这种区别就不会成立。

library(stringr)
text_data <- " I have been a Gig subscriber for a decent amount of time. When the service was originally installed I would observe 900+Mbps speeds. I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow.I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site.Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables. I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset. I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cablesNOTEWether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption. No splitters on the line. It goes from Pole ---> Wall Jack --> Modem.There is always significant Uncorrected errors. Attached are the Upstream and Downstream information and error logs. These are 4 days after a modem reset."
text_data %>%
  str_split("\\.(?=( |[A-Z]))")
#> [[1]]
#>  [1] " I have been a Gig subscriber for a decent amount of time"                                                                                                                                                                     
#>  [2] " When the service was originally installed I would observe 900+Mbps speeds"                                                                                                                                                    
#>  [3] " I have deployed to Kosovo and since then (about 8 months ago) my wife has told me that the internet has become Slow"                                                                                                          
#>  [4] "I consistently avg less than 286 Mbps when utilizing both speedtest.xfinity.com and fast.com as well as speedtest.net Current hardware: 1) Motorola MB8600 Modem 2) Linksys EA9500 I have not had a technician out to the site"
#>  [5] "Troubleshooting:I have replaced the Coax cable from wall to modem with 2 different new coax cables"                                                                                                                            
#>  [6] " I have replaced the ethernet from Modem to Router with 2 different new ethernet cables I have rebooted my Modem as well as attempted a factory reset"                                                                         
#>  [7] " I have connected my home PC directly to the Modem (bypass router) with 2 different ethernet cablesNOTEWether I use the EA9500 or go directly to the PC I get the same slow speeds.*NOTE 2*I do not have a cable subscrption"  
#>  [8] " No splitters on the line"                                                                                                                                                                                                     
#>  [9] " It goes from Pole ---> Wall Jack --> Modem"                                                                                                                                                                                   
#> [10] "There is always significant Uncorrected errors"                                                                                                                                                                                
#> [11] " Attached are the Upstream and Downstream information and error logs"                                                                                                                                                          
#> [12] " These are 4 days after a modem reset."

reprex package(v0.2.0)于2018-08-12创建。