从长字符串中提取值

时间:2015-08-08 22:41:40

标签: ruby string

我很难从iOS应用发送的字符串中提取ID和Answer值。在以下示例中,我有四个ID和四个需要提取的答案。

s = "ID:1_Answer1_ID:2_Answer2_ID:3_AnswerRandom_ID:789_Answer3.5"

IDs_array = [1,2,3,789]
Answers_array = [Answer1,Answer2,AnswerRandom,Answer3.5]

感谢任何帮助或建议。

4 个答案:

答案 0 :(得分:5)

ids, answers = s.scan(/ID:(\d+)_([^_]+)/).transpose

正则表达式的想法是:

  1. ID> 前面有 ID: - ID:
  2. 实际的 ID 是数字 - (\d+)
  3. 它们与带有下划线的答案分开 - _
  4. 答案本身是一系列非下划线字符([^_]+)
  5. String#scan带有对数组[id, answer]的返回数组,因此我们将其转置为两个数组 - 一个带有 ids ,另一个带有答案 。然后我们使用多个赋值来解压外部数组。

答案 1 :(得分:1)

没有正则表达式,我的主张:

i = 0
names = []
ids = []
s = "ID:1_Answer1_ID:2_Answer2_ID:3_AnswerRandom_ID:789_Answer3.5"

s.split("_").each do |f|
    if i.odd?
        names.push(f)
    else
        ids.push(f.split(":")[1])       
    end
    i+=1
end

答案 2 :(得分:0)

有很多方法可以做到这一点。这是一个使用两个连续的split并且没有正则表达式的。我假设字符串开始ID:,因为如果不一定是这种情况,则需要进一步说明问题。

ids, answers = s[3..-1].split(/_ID:/).map { |str| str.split('_') }.transpose
  #=> [["1", "2", "3", "789"],
  #    ["Answer1", "Answer2", "AnswerRandom", "Answer3.5"]] 

步骤:

t = s[3..-1]
  #=> "1_Answer1_ID:2_Answer2_ID:3_AnswerRandom_ID:789_Answer3.5" 
a = t.split('_ID:')
  #=> ["1_Answer1", "2_Answer2", "3_AnswerRandom", "789_Answer3.5"] 
b = a.map { |str| str.split('_') }
  #=> [["1", "Answer1"], ["2", "Answer2"],
  #    ["3", "AnswerRandom"], ["789", "Answer3.5"]] 
b.transpose
  #=> [["1", "2", "3", "789"],
  #    ["Answer1", "Answer2", "AnswerRandom", "Answer3.5"]] 

答案 3 :(得分:-1)

请澄清你的问题。

ID和答案是否需要匹配?他们总是成双成对的? "_"字符是否总是用作分隔符(这意味着应该对答案进行编码)?格式总是:

"ID:#{id_mumber}_#{answer in text}" ... "_" ... *

我会假设我问的所有问题的答案是"是",但如果我错了,请编辑你的问题并留下评论 - 我将编辑答案。

s = "ID:1_Answer1_ID:2_Answer2_ID:3_AnswerRandom_ID:789_Answer3.5"

answer_hash = {}

tmp = s.split('_')

answer_hash[tmp.shift[3..-1].to_i] = tmp.shift while tmp[0]

answer_hash # => {1=>"Answer1", 2=>"Answer2", 3=>"AnswerRandom", 789=>"Answer3.5"} 
answer_hash.keys # => [1, 2, 3, 789]
answer_hash.values # => ["Answer1", "Answer2", "AnswerRandom", "Answer3.5"]

修改

我喜欢@ndn使用Regexp的答案......它更清晰,但对于较短的字符串可能会更慢。

以下是我的机器上的基准测试 - 它们表明性能差异主要是针对较短的ID字符串:

s = "ID:1_Answer1_ID:2_Answer2_ID:3_AnswerRandom_ID:789_Answer3.5"

puts Benchmark.measure {100_000.times {answer_hash = {}; tmp = s.split('_'); answer_hash[tmp.shift[3..-1].to_i] = tmp.shift while tmp[0] } }

# ### Short string using str
# =>   0.280000   0.000000   0.280000 (  0.286917)

puts Benchmark.measure {100_000.times {ids, answers = *s.scan(/(?<=ID:)(\d+)_([^_]+)/).transpose } }

# ### Short string using string.scan Regexp
# =>  0.590000   0.000000   0.590000 (  0.595052)



s = []
100.times {|i| s << ("ID:#{i}_Answer#{i}") }
s = s.join('_')



puts Benchmark.measure {100_000.times {answer_hash = {}; tmp = s.split('_'); answer_hash[tmp.shift[3..-1].to_i] = tmp.shift while tmp[0] } }

# ### Medium string using string.split
# =>  7.180000   0.010000   7.190000 (  7.213266)


puts Benchmark.measure {100_000.times {ids, answers = *s.scan(/(?<=ID:)(\d+)_([^_]+)/).transpose } }

# ### Medium string using string.scan Regexp
# =>  8.860000   0.020000   8.880000 (  8.888352)



s = []
1000.times {|i| s << ("ID:#{i}_Answer#{i}") }
s = s.join('_')


puts Benchmark.measure {1000.times {answer_hash = {}; tmp = s.split('_'); answer_hash[tmp.shift[3..-1].to_i] = tmp.shift while tmp[0] } }

# ### Long string using string.split (shorter benchmark)
# =>  0.690000   0.000000   0.690000 (  0.693698)


puts Benchmark.measure {1000.times {ids, answers = *s.scan(/(?<=ID:)(\d+)_([^_]+)/).transpose } }

# ### Long string using string.scan Regexp (shorter benchmark)
# =>  0.900000   0.000000   0.900000 (  0.901358)