R - 组特定列表项

时间:2015-06-23 17:22:08

标签: r

我有一个如下字符数据:

  x= c("Clause 1 - AGREEMENT. Buyer agrees to buy, and Seller agrees to sell, the Property described below on the terms and conditions set forth in this contract.",
       "Clause 2 - Buyer. Buyer, will take title to the Property described below:",
      "Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.",
       "Clause 3 - Inclusions. The Purchase Price includes the following items: ",
       "Item 3.1 - Fixtures. If attached to the Property on the date of this Contract, the following items are included unless")

我尝试将所有项目分组到列表中的子句中。基本上,我希望它能够做到这一点

x[grep("Clause . - ", x)]= c(x[1], paste(x[2], x[3]), paste(x[4], x[5])) 

和这个

x= x[grep("Clause . - ", x)]

但动态。如果不指定我想要组合的列表项,我该怎么办呢?谢谢大家。

2 个答案:

答案 0 :(得分:1)

首先删除数字:

> nums <- gsub("^..* (\\d+\\.*\\d*) -..*$", "\\1", x, perl = T)
> nums
[1] "1"   "2"   "2.1" "3"   "3.1"

通过删除小数位来对它们进行分组:

> nums <- as.integer(nums)
> nums
[1] 1 2 2 3 3

循环遍历这些分组并将它们粘贴在一起:

> grouped <- tapply(x, nums, paste, collapse='\n')
> cat(grouped[2])
Clause 2 - Buyer. Buyer, will take title to the Property described below:
Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.

答案 1 :(得分:0)

我解决了我的问题,改编了Zelazny提供的答案。有了数据:

> x= c("Clause 1 - AGREEMENT. Buyer agrees to buy",
        "Item 1.2 - Seller agrees to sell",
        "Item 1.2 - the Property described below",
        "Item 1.3 - on the terms and conditions set forth in this contract",
        "Item 1.4 - If attached to the Property on the date of this Contract",
        "Item 1.5 - the following items are included:",
        "I - property",
        "II - car",
        "III - motorcycle",
        "Clause 2 - Buyer, will take title to the Property described below:",
        "Item 2.1 - Seller. Seller, is the current owner of the Property",
        "I - this is binding contract",
        "Item 2.2 - by Buyer without Seller’s prior written consent.",
        "Clause 3 - The Purchase Price includes the following items",
        "Clause 4 - property will be transmited",
        "Clause 5 - as discribed in",
        "Each party is signing this agreement on the date stated opposite that party’s signature.",
        "city, date")

首先找到条款的项目:

> f= grep("Clause . - ", x)  
> f  
[1]  1 10 14 15 16

由于rep dosn允许列出一些时间,循环并重复所有丢失的itens的前一个项目编号:

> nums= f  
> for (i in 1:length(f)-1){  
>    a= f[i+1]-f[i]-1 #times to repeat the number  
>    nums= c(nums, rep(f[i], times= a))  
> }  
> sort(nums)  
 [1]  1  1  1  1  1  1  1  1  1 10 10 10 10 14 15 16

添加最后一个子句后面的所有数字:

> nums= sort(c(nums, (1+f[length(f)]):length(x)))
> nums
 [1]  1  1  1  1  1  1  1  1  1 10 10 10 10 14 15 16 17 18

最后将条款分组:

> grouped <- tapply(x, nums, paste, collapse='\n')  
> cat(grouped[1])  
 Clause 1 - AGREEMENT. Buyer agrees to buy
 Item 1.2 - Seller agrees to sell
 Item 1.2 - the Property described below
 Item 1.3 - on the terms and conditions set forth in this contract
 Item 1.4 - Fixtures. If attached to the Property on the date of this Contract
 Item 1.5 - the following items are included:
 I - property
 II - car
 III - motorcycle