解释这个原始文本 - 策略?

时间:2012-05-28 23:06:38

标签: ruby parsing text language-agnostic screen-scraping

我有这个原始文本:

________________________________________________________________________________________________________________________________
Pos Car  Competitor/Team                Driver                   Vehicle              Cap   CL Laps     Race.Time Fastest...Lap

1     6  Jason Clements                 Jason Clements           BMW M3               3200       10     9:48.5710   3 0:57.3228*
2    42  David Skillender               David Skillender         Holden VS Commodore  6000       10     9:55.6866   2 0:57.9409 
3    37  Bruce Cook                     Bruce Cook               Ford  Escort         3759       10     9:56.4388   4 0:58.3359 
4    18  Troy Marinelli                 Troy Marinelli           Nissan  Silvia       3396       10     9:56.7758   2 0:58.4443 
5    75  Anthony Gilbertson             Anthony Gilbertson       BMW M3               3200       10    10:02.5842   3 0:58.9336 
6    26  Trent Purcell                  Trent Purcell            Mazda RX7            2354       10    10:07.6285   4 0:59.0546 
7    12  Scott Hunter                   Scott Hunter             Toyota  Corolla      2000       10    10:11.3722   5 0:59.8921 
8    91  Graeme Wilkinson               Graeme Wilkinson         Ford  Escort         2000       10    10:13.4114   5 1:00.2175 
9     7  Justin Wade                    Justin Wade              BMW M3               4000       10    10:18.2020   9 1:00.8969 
10   55  Greg Craig                     Grag Craig               Toyota  Corolla      1840       10    10:18.9956   7 1:00.7905 
11   46  Kyle Orgam-Moore               Kyle Organ-Moore         Holden VS Commodore  6000       10    10:30.0179   3 1:01.6741 
12   39  Uptiles Strathpine             Trent Spencer            BMW Mini Cooper S    1500       10    10:40.1436   2 1:02.2728 
13  177  Mark Hyde                      Mark Hyde                Ford  Escort         1993       10    10:49.5920   2 1:03.8069 
14   34  Peter Draheim                  Peter Draheim            Mazda RX3            2600       10    10:50.8159  10 1:03.4396 
15    5  Scott Douglas                  Scott Douglas            Datsun  1200         1998        9     9:48.7808   3 1:01.5371 
16   72  Paul Redman                    Paul Redman              Ford  Focus          2lt         9    10:11.3707   2 1:05.8729 
17    8  Matthew Speakman               Matthew Speakman         Toyota  Celica       1600        9    10:16.3159   3 1:05.9117 
18   74  Lucas Easton                   Lucas Easton             Toyota  Celica       1600        9    10:16.8050   6 1:06.0748 
19   77  Dean Fuller                    Dean Fuller              Mitsubishi  Sigma    2600        9    10:25.2877   3 1:07.3991 
20   16  Brett Batterby                 Brett Batterby           Toyota  Corolla      1600        9    10:29.9127   4 1:07.8420 
21   95  Ross Hurford                   Ross Hurford             Toyota  Corolla      1600        8     9:57.5297   2 1:12.2672 
DNF  13  Charles Wright                 Charles Wright           BMW 325i             2700        9     9:47.9888   7 1:03.2808 
DNF  20  Shane Satchwell                Shane Satchwell          Datsun  1200 Coupe   1998        1     1:05.9100   1 1:05.9100 

Fastest Lap Av.Speed Is 152kph, Race Av.Speed Is 148kph
R=under lap record by greatest margin, r=under lap record, *=fastest lap time
________________________________________________________________________________________________________________________________
Issue# 2 - Printed Sat May 26 15:43:31 2012                     Timing System By NATSOFT (03)63431311 www.natsoft.com.au/results
Amended 

我需要将它解析为具有明显位置,汽车,驱动程序等字段的对象。问题是我不知道使用什么样的策略。如果我将它拆分为空格,我最终会得到一个如下列表:

["1", "6", "Jason", "Clements", "Jason", "Clements", "BMW", "M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"]

你能看到这个问题。我不能只解释这个列表,因为人们可能只有一个名字,或一个名字中的3个单词,或汽车中的许多不同的单词。它使得仅使用索引来引用列表是不可能的。

如何使用列名定义的偏移量?我不太清楚如何使用它。

编辑:所以我使用的当前算法的工作原理如下:

  1. 拆分新行上的文字,给出一系列行。
  2. 在每一行上找到常见的空白字符FURTHEST RIGHT。即每一行的位置(索引)彼此之间 line包含空格。 EG:
  3. 根据这些常用字符拆分行。
  4. 修剪线条
  5. 存在几个问题:

    如果名称包含相同的长度,请执行以下操作:

    Jason Adams
    Bobby Sacka
    Jerry Louis
    

    然后它会将其解释为两个单独的项目:(["Jason" "Adams", "Bobby", "Sacka", "Jerry", "Louis"])。

    然而,如果它们都如此不同:

    Dominic Bou
    Bob Adams
    Jerry Seinfeld
    

    然后它会在Seinfeld的最后一个'd'上正确分割(因此我们会得到三个名字的集合(["Dominic Bou", "Bob Adams", "Jerry Seinfeld"])。

    它也很脆弱。我正在寻找一个更好的解决方案。

10 个答案:

答案 0 :(得分:6)

这对正则表达式来说不是一个好例子,你真的想要发现格式然后解压缩行:

lines = str.split "\n"

# you know the field names so you can use them to find the column positions
fields = ['Pos', 'Car', 'Competitor/Team', 'Driver', 'Vehicle', 'Cap', 'CL Laps', 'Race.Time', 'Fastest...Lap']
header = lines.shift until header =~ /^Pos/
positions = fields.map{|f| header.index f}

# use that to construct an unpack format string
format = 1.upto(positions.length-1).map{|x| "A#{positions[x] - positions[x-1]}"}.join
# A4A5A31A25A21A6A12A10

lines.each do |line|
  next unless line =~ /^(\d|DNF)/ # skip lines you're not interested in
  data = line.unpack(format).map{|x| x.strip}
  puts data.join(', ')
  # or better yet...
  car = Hash[fields.zip data]
  puts car['Driver']
end

答案 1 :(得分:6)

http://blog.ryanwood.com/past/2009/6/12/slither-a-dsl-for-parsing-fixed-width-text-files这可以解决您的问题。

here是更多的例子和github。

希望这有帮助!

答案 2 :(得分:5)

我认为在每一行上使用固定宽度很容易。

#!/usr/bin/env ruby

# ruby parsing_winner.rb winners_list.txt 
args = ARGV
puts "ruby parsing_winner.rb winners_list.txt " if args.empty?
winner_file = open args.shift
array_of_race_results, array_of_race_results_array  = [], []

class RaceResult

  attr_accessor :position, :car, :team, :driver, :vehicle, :cap, :cl_laps, :race_time, :fastest, :fastest_lap
  def initialize(position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap)
    @position    = position 
    @car         = car 
    @team        = team  
    @driver      = driver  
    @vehicle     = vehicle  
    @cap         = cap  
    @cl_laps     = cl_laps  
    @race_time   = race_time 
    @fastest     = fastest
    @fastest_lap = fastest_lap 
  end

  def to_a
    # ["1", "6", "Jason", "Clements", "Jason", "Clements", "BMW", "M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"]
    [position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap]
  end
end

# Pos Car  Competitor/Team                Driver                   Vehicle              Cap   CL Laps     Race.Time Fastest...Lap

# 1     6  Jason Clements                 Jason Clements           BMW M3               3200       10     9:48.5710   3 0:57.3228*
# 2    42  David Skillender               David Skillender         Holden VS Commodore  6000       10     9:55.6866   2 0:57.9409
# etc...
winner_file.each_line do |line|
  next if line[/^____/] || line[/^\w{4,}|^\s|^Pos/] || line[0..3][/\=/]
  position    = line[0..3].strip
  car         = line[4..8].strip
  team        = line[9..39].strip
  driver      = line[40..64].strip
  vehicle     = line[65..85].strip
  cap         = line[86..91].strip
  cl_laps     = line[92..101].strip
  race_time   = line[102..113].strip
  fastest     = line[114..116].strip
  fastest_lap = line[117..-1].strip
  racer = RaceResult.new(position, car, team, driver, vehicle, cap, cl_laps, race_time, fastest, fastest_lap)
  array_of_race_results << racer
  array_of_race_results_array << racer.to_a
end

puts "Race Results Objects: #{array_of_race_results}"
puts "Race Results: #{array_of_race_results_array.inspect}"

输出=&gt;

Race Results Objects: [#<RaceResult:0x007fcc4a84b7c8 @position="1", @car="6", @team="Jason Clements", @driver="Jason Clements", @vehicle="BMW M3", @cap="3200", @cl_laps="10", @race_time="9:48.5710", @fastest="3", @fastest_lap="0:57.3228*">, #<RaceResult:0x007fcc4a84aa08 @position="2", @car="42", @team="David Skillender", @driver="David Skillender", @vehicle="Holden VS Commodore", @cap="6000", @cl_laps="10", @race_time="9:55.6866", @fastest="2", @fastest_lap="0:57.9409">, #<RaceResult:0x007fcc4a849ce8 @position="3", @car="37", @team="Bruce Cook", @driver="Bruce Cook", @vehicle="Ford  Escort", @cap="3759", @cl_laps="10", @race_time="9:56.4388", @fastest="4", @fastest_lap="0:58.3359">, #<RaceResult:0x007fcc4a8491f8 @position="4", @car="18", @team="Troy Marinelli", @driver="Troy Marinelli", @vehicle="Nissan  Silvia", @cap="3396", @cl_laps="10", @race_time="9:56.7758", @fastest="2", @fastest_lap="0:58.4443">, #<RaceResult:0x007fcc4b091ab8 @position="5", @car="75", @team="Anthony Gilbertson", @driver="Anthony Gilbertson", @vehicle="BMW M3", @cap="3200", @cl_laps="10", @race_time="10:02.5842", @fastest="3", @fastest_lap="0:58.9336">, #<RaceResult:0x007fcc4b0916a8 @position="6", @car="26", @team="Trent Purcell", @driver="Trent Purcell", @vehicle="Mazda RX7", @cap="2354", @cl_laps="10", @race_time="10:07.6285", @fastest="4", @fastest_lap="0:59.0546">, #<RaceResult:0x007fcc4b091298 @position="7", @car="12", @team="Scott Hunter", @driver="Scott Hunter", @vehicle="Toyota  Corolla", @cap="2000", @cl_laps="10", @race_time="10:11.3722", @fastest="5", @fastest_lap="0:59.8921">, #<RaceResult:0x007fcc4b090e88 @position="8", @car="91", @team="Graeme Wilkinson", @driver="Graeme Wilkinson", @vehicle="Ford  Escort", @cap="2000", @cl_laps="10", @race_time="10:13.4114", @fastest="5", @fastest_lap="1:00.2175">, #<RaceResult:0x007fcc4b090a78 @position="9", @car="7", @team="Justin Wade", @driver="Justin Wade", @vehicle="BMW M3", @cap="4000", @cl_laps="10", @race_time="10:18.2020", @fastest="9", @fastest_lap="1:00.8969">, #<RaceResult:0x007fcc4b090668 @position="10", @car="55", @team="Greg Craig", @driver="Grag Craig", @vehicle="Toyota  Corolla", @cap="1840", @cl_laps="10", @race_time="10:18.9956", @fastest="7", @fastest_lap="1:00.7905">, #<RaceResult:0x007fcc4b090258 @position="11", @car="46", @team="Kyle Orgam-Moore", @driver="Kyle Organ-Moore", @vehicle="Holden VS Commodore", @cap="6000", @cl_laps="10", @race_time="10:30.0179", @fastest="3", @fastest_lap="1:01.6741">, #<RaceResult:0x007fcc4b08fe48 @position="12", @car="39", @team="Uptiles Strathpine", @driver="Trent Spencer", @vehicle="BMW Mini Cooper S", @cap="1500", @cl_laps="10", @race_time="10:40.1436", @fastest="2", @fastest_lap="1:02.2728">, #<RaceResult:0x007fcc4b08fa38 @position="13", @car="177", @team="Mark Hyde", @driver="Mark Hyde", @vehicle="Ford  Escort", @cap="1993", @cl_laps="10", @race_time="10:49.5920", @fastest="2", @fastest_lap="1:03.8069">, #<RaceResult:0x007fcc4b08f628 @position="14", @car="34", @team="Peter Draheim", @driver="Peter Draheim", @vehicle="Mazda RX3", @cap="2600", @cl_laps="10", @race_time="10:50.8159", @fastest="10", @fastest_lap="1:03.4396">, #<RaceResult:0x007fcc4b08f218 @position="15", @car="5", @team="Scott Douglas", @driver="Scott Douglas", @vehicle="Datsun  1200", @cap="1998", @cl_laps="9", @race_time="9:48.7808", @fastest="3", @fastest_lap="1:01.5371">, #<RaceResult:0x007fcc4b08ee08 @position="16", @car="72", @team="Paul Redman", @driver="Paul Redman", @vehicle="Ford  Focus", @cap="2lt", @cl_laps="9", @race_time="10:11.3707", @fastest="2", @fastest_lap="1:05.8729">, #<RaceResult:0x007fcc4b08e9f8 @position="17", @car="8", @team="Matthew Speakman", @driver="Matthew Speakman", @vehicle="Toyota  Celica", @cap="1600", @cl_laps="9", @race_time="10:16.3159", @fastest="3", @fastest_lap="1:05.9117">, #<RaceResult:0x007fcc4b08e5e8 @position="18", @car="74", @team="Lucas Easton", @driver="Lucas Easton", @vehicle="Toyota  Celica", @cap="1600", @cl_laps="9", @race_time="10:16.8050", @fastest="6", @fastest_lap="1:06.0748">, #<RaceResult:0x007fcc4b08e1d8 @position="19", @car="77", @team="Dean Fuller", @driver="Dean Fuller", @vehicle="Mitsubishi  Sigma", @cap="2600", @cl_laps="9", @race_time="10:25.2877", @fastest="3", @fastest_lap="1:07.3991">, #<RaceResult:0x007fcc4b08ddc8 @position="20", @car="16", @team="Brett Batterby", @driver="Brett Batterby", @vehicle="Toyota  Corolla", @cap="1600", @cl_laps="9", @race_time="10:29.9127", @fastest="4", @fastest_lap="1:07.8420">, #<RaceResult:0x007fcc4a848348 @position="21", @car="95", @team="Ross Hurford", @driver="Ross Hurford", @vehicle="Toyota  Corolla", @cap="1600", @cl_laps="8", @race_time="9:57.5297", @fastest="2", @fastest_lap="1:12.2672">, #<RaceResult:0x007fcc4a847948 @position="DNF", @car="13", @team="Charles Wright", @driver="Charles Wright", @vehicle="BMW 325i", @cap="2700", @cl_laps="9", @race_time="9:47.9888", @fastest="7", @fastest_lap="1:03.2808">, #<RaceResult:0x007fcc4a847010 @position="DNF", @car="20", @team="Shane Satchwell", @driver="Shane Satchwell", @vehicle="Datsun  1200 Coupe", @cap="1998", @cl_laps="1", @race_time="1:05.9100", @fastest="1", @fastest_lap="1:05.9100">]
Race Results: [["1", "6", "Jason Clements", "Jason Clements", "BMW M3", "3200", "10", "9:48.5710", "3", "0:57.3228*"], ["2", "42", "David Skillender", "David Skillender", "Holden VS Commodore", "6000", "10", "9:55.6866", "2", "0:57.9409"], ["3", "37", "Bruce Cook", "Bruce Cook", "Ford  Escort", "3759", "10", "9:56.4388", "4", "0:58.3359"], ["4", "18", "Troy Marinelli", "Troy Marinelli", "Nissan  Silvia", "3396", "10", "9:56.7758", "2", "0:58.4443"], ["5", "75", "Anthony Gilbertson", "Anthony Gilbertson", "BMW M3", "3200", "10", "10:02.5842", "3", "0:58.9336"], ["6", "26", "Trent Purcell", "Trent Purcell", "Mazda RX7", "2354", "10", "10:07.6285", "4", "0:59.0546"], ["7", "12", "Scott Hunter", "Scott Hunter", "Toyota  Corolla", "2000", "10", "10:11.3722", "5", "0:59.8921"], ["8", "91", "Graeme Wilkinson", "Graeme Wilkinson", "Ford  Escort", "2000", "10", "10:13.4114", "5", "1:00.2175"], ["9", "7", "Justin Wade", "Justin Wade", "BMW M3", "4000", "10", "10:18.2020", "9", "1:00.8969"], ["10", "55", "Greg Craig", "Grag Craig", "Toyota  Corolla", "1840", "10", "10:18.9956", "7", "1:00.7905"], ["11", "46", "Kyle Orgam-Moore", "Kyle Organ-Moore", "Holden VS Commodore", "6000", "10", "10:30.0179", "3", "1:01.6741"], ["12", "39", "Uptiles Strathpine", "Trent Spencer", "BMW Mini Cooper S", "1500", "10", "10:40.1436", "2", "1:02.2728"], ["13", "177", "Mark Hyde", "Mark Hyde", "Ford  Escort", "1993", "10", "10:49.5920", "2", "1:03.8069"], ["14", "34", "Peter Draheim", "Peter Draheim", "Mazda RX3", "2600", "10", "10:50.8159", "10", "1:03.4396"], ["15", "5", "Scott Douglas", "Scott Douglas", "Datsun  1200", "1998", "9", "9:48.7808", "3", "1:01.5371"], ["16", "72", "Paul Redman", "Paul Redman", "Ford  Focus", "2lt", "9", "10:11.3707", "2", "1:05.8729"], ["17", "8", "Matthew Speakman", "Matthew Speakman", "Toyota  Celica", "1600", "9", "10:16.3159", "3", "1:05.9117"], ["18", "74", "Lucas Easton", "Lucas Easton", "Toyota  Celica", "1600", "9", "10:16.8050", "6", "1:06.0748"], ["19", "77", "Dean Fuller", "Dean Fuller", "Mitsubishi  Sigma", "2600", "9", "10:25.2877", "3", "1:07.3991"], ["20", "16", "Brett Batterby", "Brett Batterby", "Toyota  Corolla", "1600", "9", "10:29.9127", "4", "1:07.8420"], ["21", "95", "Ross Hurford", "Ross Hurford", "Toyota  Corolla", "1600", "8", "9:57.5297", "2", "1:12.2672"], ["DNF", "13", "Charles Wright", "Charles Wright", "BMW 325i", "2700", "9", "9:47.9888", "7", "1:03.2808"], ["DNF", "20", "Shane Satchwell", "Shane Satchwell", "Datsun  1200 Coupe", "1998", "1", "1:05.9100", "1", "1:05.9100"]]

答案 3 :(得分:4)

根据格式的一致性,您可以使用正则表达式。

这是一个适用于当前数据的示例正则表达式 - 可能需要根据精确的规则进行调整,但它提供了这个想法:

^

# Pos
(\d+|DNF)
\s+

#Car
(\d+)
\s+

# Team
([\w-]+(?: [\w-]+)+)
\s+

# Driver
([\w-]+(?: [\w-]+)+)
\s+

# Vehicle
([\w-]+(?:  ?[\w-]+)+)
\s+

# Cap
(\d{4}|\dlt)
\s+

# CL Laps
(\d+)
\s+

# Race.Time
(\d+:\d+\.\d+)
\s+

# Fastest Lap
(\d+)
\s+

# Fastest Lap Time
(\d+:\d+\.\d+\*?)
\s*

$

答案 4 :(得分:4)

如果您可以验证空格是空格字符而不是制表符,并且过长的文本总是被截断以适合列结构,那么我会对切片边界进行硬编码:

parsed = [rawLine[0:3],rawLine[4:7],rawLine[9:38], ...etc... ]

根据数据源的不同,这可能很脆弱(例如,如果每次运行都有不同的列宽)。

如果标题行始终相同,则可以通过搜索标题行的已知单词来提取切片边界。

答案 5 :(得分:4)

您可以使用fixed_width gem。

您可以使用以下代码解析您的给定文件:

require 'fixed_width'
require 'pp'

FixedWidth.define :cars do |d|
  d.head do |head|
    head.trap { |line| line !~ /\d/ }
  end
  d.body do |body|
    body.trap { |line| line =~ /^(\d|DNF)/ }
    body.column :pos, 4
    body.column :car, 5
    body.column :competitor, 31
    body.column :driver, 25
    body.column :vehicle, 21
    body.column :cap, 5
    body.column :cl_laps, 11
    body.column :race_time, 11
    body.column :fast_lap_no, 4
    body.column :fast_lap_time, 10
  end
end

pp FixedWidth.parse(File.open("races.txt"), :cars)

trap方法标识每个部分中的行。我使用了正则表达式:

  • head正则表达式查找不包含数字的行。
  • body正则表达式查找以数字或“DNF”
  • 开头的行

每个部分必须包含紧接在最后一行之后的行。 column定义只是标识要抓取的列数。该库为您删除空白。如果你想生成一个固定宽度的文件,你可以添加对齐参数,但是看起来你不需要它。

结果是一个以这样开头的哈希:

{:head=>[{}, {}, {}],
 :body=>
  [{:pos=>"1",
    :car=>"6",
    :competitor=>"Jason Clements",
    :driver=>"Jason Clements",
    :vehicle=>"BMW M3",
    :cap=>"3200",
    :cl_laps=>"10",
    :race_time=>"9:48.5710",
    :fast_lap_no=>"3",
    :fast_lap_time=>"0:57.3228"},
   {:pos=>"2",
    :car=>"42",
    :competitor=>"David Skillender",
    :driver=>"David Skillender",
    :vehicle=>"Holden VS Commodore",
    :cap=>"6000",
    :cl_laps=>"10",
    :race_time=>"9:55.6866",
    :fast_lap_no=>"2",
    :fast_lap_time=>"0:57.9409"},

答案 6 :(得分:4)

好吧,我知道了:

修改:我忘了提及,假设您已将输入文本存储在变量input_string

# Choose a delimeter that is unlikely to occure
DELIM = '|||'

# DRY -> extend String
class String
  def split_on_spaces(min_spaces = 1)
    self.strip.gsub(/\s{#{min_spaces},}/, DELIM).split(DELIM)
  end
end

# just get the data lines
lines = input_string.split("\n")
lines = lines[2...(lines.length - 4)].delete_if { |line|
  line.empty?
}

# Grab all the entries into a nice 2-d array
entries = lines.map { |line|
  [
    line[0..8].split_on_spaces,
    line[9..85].split_on_spaces(3).map{ |string| 
      string.gsub(/\s+/, ' ')  # replace whitespace with 1 space
    },
    line[85...line.length].split_on_spaces(2)

  ].flatten
}

# BONUS

# Make nice hashes
keys = [:pos, :car, :team, :driver, :vehicle, :cap, :cl_laps, :race_time, :fastest_lap]
objects = entries.map { |entry|
  Hash[keys.zip entry]
}

输出:

entries # =>
["1", "6", "Jason Clements", "Jason Clements", "BMW M3", "3200", "10", "9:48.5710", "3 0:57.3228*"]
["2", "42", "David Skillender", "David Skillender", "Holden VS Commodore", "6000", "10", "9:55.6866", "2 0:57.9409"]
...
# all of length 9, no extra spaces

如果数组只是不削减它

objects # =>
{:pos=>"1", :car=>"6", :team=>"Jason Clements", :driver=>"Jason Clements", :vehicle=>"BMW M3", :cap=>"3200", :cl_laps=>"10", :race_time=>"9:48.5710", :fastest_lap=>"3 0:57.3228*"}
{:pos=>"2", :car=>"42", :team=>"David Skillender", :driver=>"David Skillender", :vehicle=>"Holden VS Commodore", :cap=>"6000", :cl_laps=>"10", :race_time=>"9:55.6866", :fastest_lap=>"2 0:57.9409"}
...

我将它重构为适合你的功能。

答案 7 :(得分:3)

除非有关于如何分隔列的明确规则,否则你无法真正做到这一点。

您采用的方法很好,假设您知道每个列值都正确缩进到列标题。

另一种方法可能是将仅由一个空格分隔的单词组合在一起(从您提供的文本中,我可以看到此规则也成立)。

答案 8 :(得分:2)

假设文本的间距始终相同,您可以根据位置拆分字符串,然后剥去每个部分周围的额外空格。例如,在python中:

pos=row[0:3].strip()
car=row[4:7].strip()

等等。或者,您可以定义正则表达式来捕获每个部分:

([:alnum:]+)\s([:num:]+)\s(([:alpha:]+ )+)\s(([:alpha:]+ )+)\s(([:alpha:]* )+)\s

等等。 (确切的语法取决于你的正则表达式语法。)请注意,汽车正则表达式需要处理添加的空格。

答案 9 :(得分:1)

我不会对此进行编码,但是一种绝对适用于上述数据集的方法是通过空格分析它然后以这种方式分配元素:

someArray = array of strings that were split by white space

Pos = someArray[0]
Car = someArray[1]
Competitor/Team = someArray[2] + " " + someArray[3]
Driver = someArray[4] + " " + someArray[5]
Vehicle = someArray[6] + " " + ... + " " + someArray[someArray.length - 6]
Cap = someArray[someArray.length - 5]
CL Laps = someArray[someArray.length - 4]
Race.Time = someArray[someArray.length - 3]
Fastest...Lap = someArray[someArray.length - 2] + " " + someArray[someArray.length - 1]

车辆部件可以通过某种for或while循环来完成。