我有一个如下字符数据:
x= c("Clause 1 - AGREEMENT. Buyer agrees to buy, and Seller agrees to sell, the Property described below on the terms and conditions set forth in this contract.",
"Clause 2 - Buyer. Buyer, will take title to the Property described below:",
"Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.",
"Clause 3 - Inclusions. The Purchase Price includes the following items: ",
"Item 3.1 - Fixtures. If attached to the Property on the date of this Contract, the following items are included unless")
我尝试将所有项目分组到列表中的子句中。基本上,我希望它能够做到这一点
x[grep("Clause . - ", x)]= c(x[1], paste(x[2], x[3]), paste(x[4], x[5]))
和这个
x= x[grep("Clause . - ", x)]
但动态。如果不指定我想要组合的列表项,我该怎么办呢?谢谢大家。
答案 0 :(得分:1)
首先删除数字:
> nums <- gsub("^..* (\\d+\\.*\\d*) -..*$", "\\1", x, perl = T)
> nums
[1] "1" "2" "2.1" "3" "3.1"
通过删除小数位来对它们进行分组:
> nums <- as.integer(nums)
> nums
[1] 1 2 2 3 3
循环遍历这些分组并将它们粘贴在一起:
> grouped <- tapply(x, nums, paste, collapse='\n')
> cat(grouped[2])
Clause 2 - Buyer. Buyer, will take title to the Property described below:
Item 2.1 - Seller. Seller, is the current owner of the Property described below assignable by Buyer without Seller’s prior written consent.
答案 1 :(得分:0)
我解决了我的问题,改编了Zelazny提供的答案。有了数据:
> x= c("Clause 1 - AGREEMENT. Buyer agrees to buy",
"Item 1.2 - Seller agrees to sell",
"Item 1.2 - the Property described below",
"Item 1.3 - on the terms and conditions set forth in this contract",
"Item 1.4 - If attached to the Property on the date of this Contract",
"Item 1.5 - the following items are included:",
"I - property",
"II - car",
"III - motorcycle",
"Clause 2 - Buyer, will take title to the Property described below:",
"Item 2.1 - Seller. Seller, is the current owner of the Property",
"I - this is binding contract",
"Item 2.2 - by Buyer without Seller’s prior written consent.",
"Clause 3 - The Purchase Price includes the following items",
"Clause 4 - property will be transmited",
"Clause 5 - as discribed in",
"Each party is signing this agreement on the date stated opposite that party’s signature.",
"city, date")
首先找到条款的项目:
> f= grep("Clause . - ", x)
> f
[1] 1 10 14 15 16
由于rep
dosn允许列出一些时间,循环并重复所有丢失的itens的前一个项目编号:
> nums= f
> for (i in 1:length(f)-1){
> a= f[i+1]-f[i]-1 #times to repeat the number
> nums= c(nums, rep(f[i], times= a))
> }
> sort(nums)
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16
添加最后一个子句后面的所有数字:
> nums= sort(c(nums, (1+f[length(f)]):length(x)))
> nums
[1] 1 1 1 1 1 1 1 1 1 10 10 10 10 14 15 16 17 18
最后将条款分组:
> grouped <- tapply(x, nums, paste, collapse='\n')
> cat(grouped[1])
Clause 1 - AGREEMENT. Buyer agrees to buy
Item 1.2 - Seller agrees to sell
Item 1.2 - the Property described below
Item 1.3 - on the terms and conditions set forth in this contract
Item 1.4 - Fixtures. If attached to the Property on the date of this Contract
Item 1.5 - the following items are included:
I - property
II - car
III - motorcycle