如何将此文本分成哈希红宝石

时间:2014-12-08 17:18:19

标签: ruby regex hash

抱歉,我的英语不好,我是新的 我有这个document.txt

paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
... lot more

我的意思是,如何将每一行和逗号sparator删除为像这样的哈希

result = { 
    line_num: { name1: "paula wood", name2: "sarah carnley", m1: 1277, m2: 1268, sc1: 21, sc2: 12, sc3: 21, sc4: 19  }
}

我尝试像这样编码 即时通讯使用text2re进行正则表达式here

doc = File.read("doc.txt")
lines = doc.split("\n")
counts = 0
example = {}
player1 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
player2 = '((?:[a-z][a-z]+))(.)((?:[a-z][a-z]+))'
re = (player1 + player2 )
m = Regexp.new(re, Regexp::IGNORECASE)
lines.each do |line|

re1='((?:[a-z][a-z]+))' # Word 1
re2='(.)'   # Any Single Character 1
re3='((?:[a-z][a-z]+))' # Word 2
re4='(.)'   # Any Single Character 2
re5='((?:[a-z][a-z]+))' # Word 3
re6='(.)'   # Any Single Character 3
re7='((?:[a-z][a-z]+))' # Word 4

re=(re1+re2+re3+re4+re5+re6+re7)
m=Regexp.new(re,Regexp::IGNORECASE);
if m.match(line)
    word1=m.match(line)[1];
    c1=m.match(line)[2];
    word2=m.match(line)[3];
    c2=m.match(line)[4];
    word3=m.match(line)[5];
    c3=m.match(line)[6];
    word4=m.match(line)[7];
    counts += 1
    example[counts] = word1+word2
    puts example
end
end
# (/[a-z].?/)

但输出与我的预期不符 1=>"", 2=>"indahdelika", 3=>"masam", ..more

2 个答案:

答案 0 :(得分:1)

您的数据以逗号分隔,因此请使用CSV类,而不是尝试滚动自己的解析器。如果您尝试使用逗号分割,有龙会等着您。

我使用:

require 'csv'

data = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27
"

hash = {}
CSV.parse(data).each_with_index do |row, i|
  name1, name2, m1, m2, sc1_2, sc3_4 = row
  sc1, sc2 = sc1_2.split('-')
  sc3, sc4 = sc3_4.split('-')
  hash[i] = {
    name1: name1,
    name2: name2,
    m1: m1,
    m2: m2,
    sc1: sc1,
    sc2: sc2,
    sc3: sc3,
    sc4: sc4,
  }
end

结果是:

hash
# => {0=>
#      {:name1=>"paul gordon",
#       :name2=>"jin kazama",
#       :m1=>"1277",
#       :m2=>"1268",
#       :sc1=>"21",
#       :sc2=>"12",
#       :sc3=>"21",
#       :sc4=>"19"},
#     1=>
#      {:name1=>"yoshimistu",
#       :name2=>"the rock",
#       :m1=>"2020",
#       :m2=>"2092",
#       :sc1=>"21",
#       :sc2=>"9",
#       :sc3=>"21",
#       :sc4=>"23"}}

由于您正在从文件中读取内容,请使用" Reading from a file a line at a time"修改上述内容。文档中的示例。


如果数字必须是整数,请将哈希定义调整为:

  hash[i] = {
    name1: name1,
    name2: name2,
    m1: m1.to_i,
    m2: m2.to_i,
    sc1: sc1.to_i,
    sc2: sc2.to_i,
    sc3: sc3.to_i,
    sc4: sc4.to_i,
  }

结果是:

# => {0=>
#      {:name1=>"paul gordon",
#       :name2=>"jin kazama",
#       :m1=>1277,
#       :m2=>1268,
#       :sc1=>21,
#       :sc2=>12,
#       :sc3=>21,
#       :sc4=>19},
#     1=>
#      {:name1=>"yoshimistu",
#       :name2=>"the rock",
#       :m1=>2020,
#       :m2=>2092,
#       :sc1=>21,
#       :sc2=>9,
#       :sc3=>21,
#       :sc4=>23}}
#       :sc4=>"23"}}

答案 1 :(得分:0)

这是你可以做到的另一种方式。我没有假设每行的项目数量是:namex:scx:mx的值,或者这些项的顺序。

<强>代码

def hashify(str)
  str.lines.each_with_index.with_object({}) { |(s,i),h| h[i] = inner_hash(s) }
end

def inner_hash(s)
  n = m = sc = 0
  s.split(',').each_with_object({}) do |f,g|
    case f
    when /[a-zA-Z].*/
      g["name#{n += 1}".to_sym] = f
    when /\-/
      g["sc#{sc += 1}".to_sym], g["sc#{sc += 1}".to_sym] = f.split('-').map(&:to_i)
    else
      g["m#{m += 1}".to_sym] = f.to_i
    end
  end
end

示例

str = "paul gordon,jin kazama,1277,1268,21-12,21-19
yoshimistu,the rock,2020,2092,21-9,21-23,25-27"

hashify(str)
  #=> {0=>{:name1=>"paul gordon", :name2=>"jin kazama",
  #        :m1=>1277, :m2=>1268,
  #        :sc1=>21, :sc2=>12, :sc3=>21, :sc4=>19},
  #    1=>{:name1=>"yoshimistu", :name2=>"the rock",
  #        :m1=>2020, :m2=>2092,
  #        :sc1=>21, :sc2=>9, :sc3=>21, :sc4=>23, :sc5=>25, :sc6=>27}
  #   }