我想使用Scala重新格式化文本文件的内容,例如给定的示例文件:
"good service"
Tom Martin (USA) 17th October 2015
4
Hi my name is
Tom.
I love boardgames.
Aircraft TXT-102
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi
I
like
boardgames.
Aircraft TXT-101
Type Of Customer Couple Leisure
Cabin Flown FirstClass
Route IND to CHI
Date Flown September 2015
Seat Comfort 12345
Cabin Staff Service 12345
.
.
重新格式化为:
"good service"
Tom Martin (USA) 17th October 2015
4
Hi my name is Tom. I love boardgames.
Aircraft TXT-102
"not bad"
M Muller (Canada) 22nd September 2015
6
Hi I like boardgames.
Aircraft TXT-101
Type Of Customer Couple Leisure
Cabin Flown FirstClass
Route IND to CHI
Date Flown September 2015
Seat Comfort 12345
Cabin Staff Service 12345
.
.
我已经确定了我的文件的模式,即:这个多行字符串位于由制表符分隔的数字和单词之间。
例如,第一个块的多行内容介于4 and Aircraft TXT-102
之间。第二个块的多行内容介于6 and Aircraft TXT-101
之间。此外,块由两个新行分隔。
我知道使用正则表达式进行模式匹配可能有所帮助,但我不知道如何处理这个文件。
答案 0 :(得分:1)
我在伪代码中做了什么:
while more lines available {
lines_so_far = read input until a number is seen
output(lines_so_far)
lines_to_join = read input until "Aircraft" is seen
output(joined lines_to_join)
}
仅包含数字的行的正则表达式为^\d+$
;对于以"航空公司",^Airline .*
开头的行。要查看的便捷方法是takeWhile
。