我有一个像这样的xml标签
<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/>
如何获得ms level(1)的值? 请不要建议使用nokogiri或rexml。我想学习如何在逐行读取文件时解析信息。 谢谢。
答案 0 :(得分:0)
str = '<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/>'
md = str.match(/
name="(.*?)" #Match name=" followed by any character(.), 0 or more times(*), captured in group 1 (), followed by...
\s* #whitespace(\s), 0 or more times(*), followed by...
value="(\d+)" #value=", followed by a number(\d), one or more times(+), captured in group 2 ().
/x) #x-flag: ignore whitespace in pattern, which allows you to write the regex on mutliple lines and add comments like these.
if md
name, value = md.captures #captures => an Array containing the matches for each of the parenthesized groups in the regex.
puts "#{name}(#{value})"
end
--output:--
ms level(1)
答案 1 :(得分:0)
这个答案适用于其他没有过早排除Nokogiri的读者。您可以逐行处理文件,并将每一行处理为DocumentFragment
。
require 'nokogiri'
line = '<cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/>'
fragment = Nokogiri::HTML.fragment(line)
cvparam = fragment.first_element_child
puts cvparam.attributes.values_at('name', 'value')
#=> ["ms level", "1"]