我想删除此示例文本文件中某些标头之间的所有行。
fileConn <- file("sample.txt")
one <- "*Keyword"
two <- "*Node"
three <- "$ Node,X,Y,Z"
four <- "1,639982.78040607,4733827.5104821,0"
five <- "2,639757.59709573,4733830.43494066,0"
six <- "3,639738.81268144,4733834.3619618,0"
seven <- "*End"
writeLines (c(one, two, three, four, five, six, seven), fileConn)
close(fileConn)
sample <- readLines("sample.txt")
我想要做的是删除"*Node"
和"*End"
之间的所有行/行。由于我要处理的是这些标头之间具有不同行长度的文件,因此删除方法仅需要基于标头。我不知道该怎么做,因为我只删除了以前由行号引用的数据框中的行。有什么线索吗?
预期输出为:
*Keyword
*Node
*End
答案 0 :(得分:1)
readLines
返回一个向量,而不是数据帧,因此我们可以更简单地创建样本输入:
sample = c("*Keyword",
"*Node",
"$ Node,X,Y,Z",
"1,639982.78040607,4733827.5104821,0",
"2,639757.59709573,4733830.43494066,0",
"3,639738.81268144,4733834.3619618,0",
"*End")
找到开始和结束标头,并使用负索引删除介于两者之间的元素:
node = which(sample == "*Node")
end = which(sample == "*End")
result = sample[-seq(from = node + 1, to = end - 1)]
result
# [1] "*Keyword" "*Node" "*End"
这假设存在一行*Node
和一行*End
。它还假定至少有一行要删除。您可能想创建一种更健壮的解决方案,并针对某些特殊情况进行一些处理,例如
delete_between = function(input, start, end) {
start_index = which(sample == start)
end_index = which(sample == end)
if (length(start_index) == 0 | length(end_index) == 0) {
warning("No start or end found, returning input as-is")
return(input)
}
if (length(start_index) > 1 | length(end_index) > 1) {
stop("Multiple starts or ends found.")
}
if (start_index == end_index - 1) {
return(input)
}
return(input[-seq(from = start_index + 1, to = end_index - 1)])
}