根据文件中的模式重新绑定文本文件

时间:2018-04-16 16:14:39

标签: scala

我想使用Scala重新格式化文本文件的内容,例如给定的示例文件:

"good service"
Tom Martin (USA) 17th October 2015    
4    
Hi my name is
Tom.
I love boardgames.
Aircraft    TXT-102   

"not bad"
M Muller (Canada) 22nd September 2015
6
Hi
I
like
boardgames.
Aircraft    TXT-101
Type Of Customer    Couple Leisure
Cabin Flown FirstClass
Route   IND to CHI
Date Flown  September 2015
Seat Comfort    12345
Cabin Staff Service 12345

.
.

重新格式化为:

"good service"
Tom Martin (USA) 17th October 2015    
4    
Hi my name is Tom. I love boardgames.
Aircraft    TXT-102    

"not bad"
M Muller (Canada) 22nd September 2015
6
Hi I like boardgames.
Aircraft    TXT-101
Type Of Customer    Couple Leisure
Cabin Flown FirstClass
Route   IND to CHI
Date Flown  September 2015
Seat Comfort    12345
Cabin Staff Service 12345

.
.

我已经确定了我的文件的模式,即:这个多行字符串位于由制表符分隔的数字和单词之间。 例如,第一个块的多行内容介于4 and Aircraft TXT-102之间。第二个块的多行内容介于6 and Aircraft TXT-101之间。此外,块由两个新行分隔。

我知道使用正则表达式进行模式匹配可能有所帮助,但我不知道如何处理这个文件。

1 个答案:

答案 0 :(得分:1)

我在伪代码中做了什么:

while more lines available { 
    lines_so_far = read input until a number is seen
    output(lines_so_far)
    lines_to_join = read input until "Aircraft" is seen
    output(joined lines_to_join)
}

仅包含数字的行的正则表达式为^\d+$;对于以"航空公司",^Airline .*开头的行。要查看的便捷方法是takeWhile