假设我有一个包含以下文本的txt文件:
Type: fruits
Title: retail
Date: 2015-11-10
Country: UK
Products:
apple,
passion fruit,
mango
Documents: NDA
Export: 2.10
我使用readLines
函数读取了此文件。
然后,我想要一个看起来像这样的向量:
x <- c(fruits, apple, passion fruit, mango)
因此,我想提取“类型:”之后的词以及“产品:”和“文档:”之间的所有词。 我怎样才能做到这一点?谢谢!
答案 0 :(得分:1)
如果不进行更改,则看起来类似于yaml
格式,例如使用同名包裹
library(yaml)
info <- yaml::read_yaml("your file.txt")
# strsplit - split either side of the commas
# unlist - convert to vector
# trimws - remove trailing and leading white space
out <- trimws(unlist(strsplit(info$Products, ",")))
您将在info
中以所需名称的形式获得其他条目作为列表元素,例如info$Type
答案 1 :(得分:0)
如果有这样的向量,也许有一个更好的解决方案,以防万一您可以尝试一下:
vec <- readLines("path\\file.txt")
文件中包含您发布的文本,您可以尝试以下操作:
# replace biggest spaces
gsub(" "," ",
# replace the first space
sub(" ",", ",
# pattern to extract words
gsub(".*Type:\\s*|Title.*Products:\\s*| Documents.*", "",
# collapse in one vector
paste0(vec, collapse = " "))))
[1] "fruits, apple, passion fruit, mango"
如果您dput(vec)
使代码可重现:
c("Type: fruits", "Title: retail", "Date: 2015-11-10", "Country: UK",
"Products:", " apple,", " passion fruit,", " mango", "Documents: NDA",
"Export: 2.10")