我正在使用split
来分隔此字符串:
@output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"
我想以这种形式获得一个数组:
@array_output = ["5", "490", "'Msci Italy'", "'Msci Germany'", "'Msci France'", "'Msci Spain'", "'Msci Emu'", "'05/01/2007'", "'12/01/2007'", "'19/01/2007'", "'26/01/2007'", "'02/02/2007'", "0.2000", "0.1996", "0.1994", "0.2001", "0.1983"]
我尝试使用:
@array_output = @output.split(/\s(?!\w)|\s(?=\d)/)
这适用于Rubular但是当我尝试将<%= @array_output[0] %>
或任何其他索引打印到Rails中的html.erb页面时,我什么也得不到。
@output
字符串可能有不同的长度,这只是一个显示所有可能格式的小样本。格式顺序总是相同的。
我使用@array_output
初始化@array_output = Array.new
,但不会影响结果。
我也尝试了scan
而不是split
,但也没有改变。
怎么了?
答案 0 :(得分:0)
我刚尝试了CSV
,但除了引号丢失外,它还能正常工作。如果你没关系那么
require 'csv'
@output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"
@array_output = CSV.parse_line(@output, col_sep: " ", quote_char: "'")
#=> ["5", "490", "Msci Italy", "Msci Germany", "Msci France", "Msci Spain", "Msci Emu", "05/01/2007", "12/01/2007", "19/01/2007", "26/01/2007", "02/02/2007", "0.2000", "0.1996", "0.1994", "0.2001", "0.1983"]
答案 1 :(得分:0)
你可以使用负面观察和负面展望:
/(?<![a-zA-z])\s+(?![a-zA-z])/
output = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"
output.split(/(?<![a-zA-z])\s+(?![a-zA-z])/).each { |e| puts e }
输出:
5
490
'Msci Italy'
'Msci Germany'
'Msci France'
'Msci Spain'
'Msci Emu'
'05/01/2007'
'12/01/2007'
'19/01/2007'
'26/01/2007'
'02/02/2007'
0.2000
0.1996
0.1994
0.2001
0.1983
让我们打破这个正则表达式:
(?<![a-zA-z])
这是背后的负面看法
\s+
一个或多个空格
(?![a-zA-z])
Tha是未来的负面看法
答案 2 :(得分:-1)
首先,split
不是正确的工具,定义split
使用的模式可能导致正确的输出将是一场噩梦。相反,这就是我要如何分解它:
regex = /
(\d+)
\s+
(\d+)
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
('[^']+')
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
\s+
([\d.]+)
/x
mat = regex.match("5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983")
结果是:
require 'ap'
ap mat.captures
# >> [
# >> [ 0] "5",
# >> [ 1] "490",
# >> [ 2] "'Msci Italy'",
# >> [ 3] "'Msci Germany'",
# >> [ 4] "'Msci France'",
# >> [ 5] "'Msci Spain'",
# >> [ 6] "'Msci Emu'",
# >> [ 7] "'05/01/2007'",
# >> [ 8] "'12/01/2007'",
# >> [ 9] "'19/01/2007'",
# >> [10] "'26/01/2007'",
# >> [11] "'02/02/2007'",
# >> [12] "0.2000",
# >> [13] "0.1996",
# >> [14] "0.1994",
# >> [15] "0.2001",
# >> [16] "0.1983"
# >> ]
进行一些重新排列以便于阅读:
regex = /
(\d+) \s+
(\d+) \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
('[^']+') \s+
([\d.]+) \s+
([\d.]+) \s+
([\d.]+) \s+
([\d.]+) \s+
([\d.]+)
/x
mat = regex.match("5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983")
require 'ap'
ap mat.captures
我们仍然得到:
# >> [
# >> [ 0] "5",
# >> [ 1] "490",
# >> [ 2] "'Msci Italy'",
# >> [ 3] "'Msci Germany'",
# >> [ 4] "'Msci France'",
# >> [ 5] "'Msci Spain'",
# >> [ 6] "'Msci Emu'",
# >> [ 7] "'05/01/2007'",
# >> [ 8] "'12/01/2007'",
# >> [ 9] "'19/01/2007'",
# >> [10] "'26/01/2007'",
# >> [11] "'02/02/2007'",
# >> [12] "0.2000",
# >> [13] "0.1996",
# >> [14] "0.1994",
# >> [15] "0.2001",
# >> [16] "0.1983"
# >> ]
即使这有点混乱,所以你会发现复杂的模式或代码需要易于维护和理解,他们已经将模式生成分解为小步骤并让语言处理构建复杂的模式,像这样:
SINGLE_QUOTED_PATTERN = "('[^']+')"
INTEGER_PATTERN = '(\d+)'
FLOAT_PATTERN = '([\d.]+)'
WHITE_SPACE_PATTERN = '\s+'
REGEX_STRING = [
[INTEGER_PATTERN] * 2,
[SINGLE_QUOTED_PATTERN] * 10,
[FLOAT_PATTERN] * 5
].flatten.join(WHITE_SPACE_PATTERN)
REGEX = /#{REGEX_STRING}/
# => /(\d+)\s+(\d+)\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+('[^']+')\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)\s+([\d.]+)/
data = "5 490 'Msci Italy' 'Msci Germany' 'Msci France' 'Msci Spain' 'Msci Emu' '05/01/2007' '12/01/2007' '19/01/2007' '26/01/2007' '02/02/2007' 0.2000 0.1996 0.1994 0.2001 0.1983"
mat = REGEX.match(data)
再次导致:
require 'ap'
ap mat.captures
# >> [
# >> [ 0] "5",
# >> [ 1] "490",
# >> [ 2] "'Msci Italy'",
# >> [ 3] "'Msci Germany'",
# >> [ 4] "'Msci France'",
# >> [ 5] "'Msci Spain'",
# >> [ 6] "'Msci Emu'",
# >> [ 7] "'05/01/2007'",
# >> [ 8] "'12/01/2007'",
# >> [ 9] "'19/01/2007'",
# >> [10] "'26/01/2007'",
# >> [11] "'02/02/2007'",
# >> [12] "0.2000",
# >> [13] "0.1996",
# >> [14] "0.1994",
# >> [15] "0.2001",
# >> [16] "0.1983"
# >> ]