我有一个文本文件,如下所示:
1 Hello
1.1 Hi
1.2 Hey
2 Next section
2.1 New section
3 thrid
4 last
我还有另一个类似的文本文件。
1 Hello
My name is John. It was nice to meet you.
1.1 Hi
Hi again. My last name is Doe.
1.1.1 Bye
1.2 Hey
Greetings.
2 Next section
This is the second section. I am majoring in CS.
2.1 New Section
Welcome. I am an undergraduate student.
3 third
1. hi
2. hello
3. hey
4 last
我想知道如何读取上一个文本文件中的数据,并用它来查找第二个数据文件中的特定部分以及下一个部分之后的所有内容。所以基本上,我正在尝试获得类似的东西:
Section Content
1 Hello My name is John. It was nice to meet you.
1.1 Hi Hi again. My last name is Doe. 1.1.1 Bye
1.2 Hey Greetings.
...等等
我想知道我该怎么做。
答案 0 :(得分:1)
以下解决方案当然可以得到改进,但可以为您提供解决问题的思路。根据您需要处理的文件的大小和结构,这种方法可能是可行的,或者需要对节的检测和速度进行更多的调整。
file1 =
"1 Hello
1.1 Hi
1.2 Hey
2 Next section
2.1 New section
3 thrid
4 last"
file2 =
"1 Hello
My name is John. It was nice to meet you.
1.1 Hi
Hi again. My last name is Doe.
1.1.1 Bye
1.2 Hey
Greetings.
2 Next section
This is the second section. I am majoring in CS.
2.1 New Section
Welcome. I am an undergraduate student.
3 third
1. hi
2. hello
3. hey
4 last"
file1 = unlist(strsplit(file1, "\n", fixed = T))
file2 = unlist(strsplit(file2, "\n", fixed = T))
positions = unlist(sapply(file1, function(x) grep(paste0("^", x, "$"), file2, ignore.case = T)))
positions = cbind(positions, c(positions[-1]-1, length(file2)))
text = mapply(function(x, y) file2[x:y], positions[,1], positions[,2])
text = lapply(text, function(x) x[-1])
result = cbind(positions, text)
result
# positions text
# 1 Hello 1 2 "My name is John. It was nice to meet you."
# 1.1 Hi 3 5 Character,2
# 1.2 Hey 6 7 "Greetings."
# 2 Next section 8 9 "This is the second section. I am majoring in CS."
# 2.1 New section 10 15 Character,5
# 4 last 16 16 Character,0
# Note that the text column contains lists storing the individual lines.
# e.g. for "2.1 New section":
class(result[5, "text"])
# list
result[5, "text"]
# [[1]]
# [1] "Welcome. I am an undergraduate student." "3 third" #<< note the different spelling of third
# [3] "1. hi" "2. hello"
# [5] "3. hey"
答案 1 :(得分:0)
这个问题的答案是可以的。实现将根据您用于完成此任务的编程语言而千差万别。高层次的概述将会是
int start = seconddocument.indexof(keys [x]);
int end = seconddocument.indexof(keys [x + 1]);
stringmatchedContent = seconddocument.substring(开始,结束);
这一直有效,直到您找到最后一个匹配项,因为key [x + 1]在x是最后一个键的情况下将不存在。在这种情况下,必须将end设置为文档中最后一个字符的位置,或者您使用仅以起点作为子字符串的方法。
HTH