我正在使用ruby解析文件以更改数据格式。我创建了一个正则表达式,它有三个匹配组,我想暂时存储在变量中。由于一切都是零,我无法将比赛存储起来。
这是我迄今为止所阅读的内容。
regex = '^"(\bhttps?://[-\w+&@#/%?=~_|$!:,.;]*[\w+&@#/%=~_|$])","(\w+|[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})'
begin
file = File.new("testfile.csv", "r")
while (line = file.gets)
puts line
match_array = line.scan(/regex/)
puts $&
end
file.close
end
以下是我用于测试的一些示例数据。
"https://mail.google.com","Master","password1","","https://mail.google.com","",""
"https://login.sf.org","monster@gmail.com","password2","https://login.sf.org","","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxUsername","ctl00$ctl00$ctl00$body$body$wacCenterStage$standardLogin$tbxPassword"
"http://www.facebook.com","Beast","12345678","https://login.facebook.com","","email","pass"
"http://www.own3d.tv","Earth","passWOrd3","http://www.own3d.tv","","user_name","user_password"
谢谢你,
LF4
答案 0 :(得分:5)
这不起作用:
match_array = line.scan(/regex/)
这只是使用文字“正则表达式”字符串作为正则表达式,而不是regex
变量中的字符串。您可以将大丑陋的正则表达式放入scan
或创建Regexp实例:
regex = Regexp.new('^"(\bhttps?://[-\w+&@#/%?=~_|$!:,.;]*[\w+&@#/%=~_|$])","(\w+|[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4})","(\w{1,30})')
# ...
match_array = line.scan(regex)
您应该使用CSV库(一个附带Ruby:1.8.7或1.9)来解析CSV文件,然后将正则表达式应用于CSV中的每一列。你会遇到更少的引用和逃避问题。