如何使用空格分隔字符串,除了使用正则表达式的单词之间

时间:2016-10-05 23:00:50

标签: ruby-on-rails ruby regex string split

我正在使用split来分隔此字符串:

@output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"

我想以这种形式获得一个数组:

@array_output = ["5", "490", "'Msci Italy'", "'Msci Germany'", "'Msci France'", "'Msci Spain'", "'Msci Emu'", "'05/01/2007'", "'12/01/2007'", "'19/01/2007'", "'26/01/2007'", "'02/02/2007'", "0.2000", "0.1996", "0.1994", "0.2001", "0.1983"]

我尝试使用:

@array_output = @output.split(/\s(?!\w)|\s(?=\d)/)

这适用于Rubular但是当我尝试将<%= @array_output[0] %>或任何其他索引打印到Rails中的html.erb页面时,我什么也得不到。

@output字符串可能有不同的长度,这只是一个显示所有可能格式的小样本。格式顺序总是相同的。

我使用@array_output初始化@array_output = Array.new,但不会影响结果。

我也尝试了scan而不是split,但也没有改变。

怎么了?

3 个答案:

答案 0 :(得分:0)

我刚尝试了CSV,但除了引号丢失外,它还能正常工作。如果你没关系那么

require 'csv'

@output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"

@array_output = CSV.parse_line(@output, col_sep: " ", quote_char: "'")
#=> ["5", "490", "Msci Italy", "Msci Germany", "Msci France", "Msci Spain", "Msci Emu", "05/01/2007", "12/01/2007", "19/01/2007", "26/01/2007", "02/02/2007", "0.2000", "0.1996", "0.1994", "0.2001", "0.1983"]

答案 1 :(得分:0)

你可以使用负面观察和负面展望:

/(?<![a-zA-z])\s+(?![a-zA-z])/

output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"

output.split(/(?<![a-zA-z])\s+(?![a-zA-z])/).each { |e| puts e }

输出:

5
490
'Msci Italy'
'Msci Germany'
'Msci France'
'Msci Spain'
'Msci Emu'
'05/01/2007'
'12/01/2007'
'19/01/2007'
'26/01/2007'
'02/02/2007'
0.2000
0.1996
0.1994
0.2001
0.1983

让我们打破这个正则表达式:

(?<![a-zA-z])这是背后的负面看法

\s+一个或多个空格

(?![a-zA-z]) Tha是未来的负面看法

答案 2 :(得分:-1)

首先,split不是正确的工具,定义split使用的模式可能导致正确的输出将是一场噩梦。相反,这就是我要如何分解它:

regex = /
(\d+)
\s+
(\d+)
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
/x
mat = regex.match("5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983")

结果是:

require 'ap'

ap mat.captures 

# >> [
# >>   [ 0] "5",
# >>   [ 1] "490",
# >>   [ 2] "'Msci Italy'",
# >>   [ 3] "'Msci Germany'",
# >>   [ 4] "'Msci France'",
# >>   [ 5] "'Msci Spain'",
# >>   [ 6] "'Msci Emu'",
# >>   [ 7] "'05/01/2007'",
# >>   [ 8] "'12/01/2007'",
# >>   [ 9] "'19/01/2007'",
# >>   [10] "'26/01/2007'",
# >>   [11] "'02/02/2007'",
# >>   [12] "0.2000",
# >>   [13] "0.1996",
# >>   [14] "0.1994",
# >>   [15] "0.2001",
# >>   [16] "0.1983"
# >> ]

进行一些重新排列以便于阅读:

regex = /
(\d+) \s+

(\d+) \s+

('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+

('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+

([\d.]+) \s+
([\d.]+) \s+
([\d.]+) \s+
([\d.]+) \s+
([\d.]+)
/x
mat = regex.match("5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983")

require 'ap'

ap mat.captures 

我们仍然得到:

# >> [
# >>   [ 0] "5",
# >>   [ 1] "490",
# >>   [ 2] "'Msci Italy'",
# >>   [ 3] "'Msci Germany'",
# >>   [ 4] "'Msci France'",
# >>   [ 5] "'Msci Spain'",
# >>   [ 6] "'Msci Emu'",
# >>   [ 7] "'05/01/2007'",
# >>   [ 8] "'12/01/2007'",
# >>   [ 9] "'19/01/2007'",
# >>   [10] "'26/01/2007'",
# >>   [11] "'02/02/2007'",
# >>   [12] "0.2000",
# >>   [13] "0.1996",
# >>   [14] "0.1994",
# >>   [15] "0.2001",
# >>   [16] "0.1983"
# >> ]

即使这有点混乱,所以你会发现复杂的模式或代码需要易于维护和理解,他们已经将模式生成分解为小步骤并让语言处理构建复杂的模式,像这样:

SINGLE_QUOTED_PATTERN = "('[^']+')"
INTEGER_PATTERN = '(\d+)'
FLOAT_PATTERN = '([\d.]+)'
WHITE_SPACE_PATTERN = '\s+'

REGEX_STRING = [
  [INTEGER_PATTERN] * 2,
  [SINGLE_QUOTED_PATTERN] * 10,
  [FLOAT_PATTERN] * 5
].flatten.join(WHITE_SPACE_PATTERN) 

REGEX = /#{REGEX_STRING}/
# => /(\d+)\s+(\d+)\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)/

data = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"

mat = REGEX.match(data)

再次导致:

require 'ap'

ap mat.captures 

# >> [
# >>   [ 0] "5",
# >>   [ 1] "490",
# >>   [ 2] "'Msci Italy'",
# >>   [ 3] "'Msci Germany'",
# >>   [ 4] "'Msci France'",
# >>   [ 5] "'Msci Spain'",
# >>   [ 6] "'Msci Emu'",
# >>   [ 7] "'05/01/2007'",
# >>   [ 8] "'12/01/2007'",
# >>   [ 9] "'19/01/2007'",
# >>   [10] "'26/01/2007'",
# >>   [11] "'02/02/2007'",
# >>   [12] "0.2000",
# >>   [13] "0.1996",
# >>   [14] "0.1994",
# >>   [15] "0.2001",
# >>   [16] "0.1983"
# >> ]