我有这样格式化的数据,作为单个字符串:
"1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500"
在Ruby应用程序中,我想创建两个数组:每个编号列表标记和短划线之间的一个子串,以及一个包含短划线和换行符之间的数字的子串(如整数),如这样:
labels = ["Enloe Medical Center","CSU Chico","Walmart Distribution Center","Pacific Coast Producers (Agribusiness)","Marysville School District","Feather River Hospital","Sunsweet Growers (Agriculture)","YRC (Freight Services)","Sierra Pacific Industries (Lumber Products)","Colusa Casino Resort"]
numbers = [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500]
我的正则表达不是那么好;我知道怎么做替换和匹配,但我不知道从哪里开始。有人可以帮忙吗?
答案 0 :(得分:3)
labels, numbers = string.scan(/^\s*\d+\.\s+(.+)\s+-\s+([\d,]+)\s*$/).transpose
numbers.map!{|s| s.gsub(",", "").to_i}
答案 1 :(得分:1)
有一件事让事情变得简单:
/ pat / m - 将换行视为与之匹配的字符。
其他的事情是分组(例如第2部分)。
你写一行regexp,它适合整个字符串:
r1 = /\d+\,\d+\s*$/m
str.scan r1
["2,000 ", "1,805 ", "1,350 ", "1,200 ", "1,000 "]
$
匹配行尾
\d
号码
+
多少次 - >一个或多个
\s
空格(0次或更多次)
PS。因为你知道如何替换我没有把它改成数字
r2 = /\d+\.\s*([\w\s]+)\s*\-/m
str.scan(r2).flatten
\d+
- 匹配数字1次或更多次
\.
- 匹配.
- 您必须将其转义,因为.
匹配任何字符
s*
- 空格0或更多
[\w\s]+
- 任何单词字符或空格,1次或多次
()
- 您正在进行分组,并且很容易说我希望将此包围在此处,更多信息来自:regexp ruby - capturing
答案 2 :(得分:0)
s = "1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500"
arr1 = s.each_line.map { | x |
x.match(/- (.*)/)[ 1 ].gsub(/[^0-9]*/,'')
}
arr2 = s.each_line.map { | x |
x.match(/\d. (.*) - (.*)/)[ 1 ]
}
puts arr1
puts arr2
答案 3 :(得分:0)
str = %{1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500}
numbers = str.scan(/-\ (\d.*)$/).flatten.map{|s| s.gsub(",", "").to_i} # => [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500] # !> assigned but unused variable - numbers
labels = str.scan(/\d+\.\s(.*)\s-/).flatten # => ["Enloe Medical Center", "CSU Chico", "Walmart Distribution Center", "Pacific Coast Producers (Agribusiness)", "Marysville School District", "Feather River Hospital", "Sunsweet Growers (Agriculture)", "YRC (Freight Services)", "Sierra Pacific Industries (Lumber Products)", "Colusa Casino Resort"] # !> assigned but unused variable - labels
答案 4 :(得分:0)
你可以这样做:
rawlines = <<EOF
1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500
EOF
labels = []
numbers = []
rawlines.scan(/^[0-9]+\. ([^-]+) - ([1-9][0-9]{0,2}(?>,[0-9]{3})*)/) do |label, number|
labels << label
numbers << number.gsub(",", "")
end
puts labels
puts numbers
请注意,模式([1-9][0-9]{0,2}(?>,[0-9]{3})*)
的这一部分可以替换为([0-9,]+)