我有一个文件,其内容如下:
this is test line 1
this is testing purpose
<public>
am inside of public
doing lot of stuffs and priting result here
</public>
<public>
am inside of another public
doing another set of stuffs and priting here
</public>
我想将此文件拆分为三个不同的部分:
我尝试使用take_while
和drop_while
,
File.open(filename).each_line.take_while do |l|
!l.include?('</public>')
end.drop_while do |l|
!l.include?('<public>')
end.drop(1))
但它仅提取第一个<public>
... </public>
部分。
在某些情况下,订单可能会发生变化,例如公共部分将首先出现,其余内容将在最后或中间出现。如果内容顺序与上面的模板相同,那么我可以按照下面的方法
File.read(filename).split(/<\/?public>/)
.map(&:strip)
.reject(&:empty?)
我从Split lines using tags that appear multiple times in file得到答案。
但是看一些通用方法,无论如何我都可以处理数据。
我正在寻找更好的解决方案。任何建议都将不胜感激。
答案 0 :(得分:1)
考虑一下:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
this is test line 1
<public>
am inside of public
</public>
<public>
am inside of another public
</public>
</root>
EOT
text_inside_public_tags = doc.search('public').map(&:text)
# => ["\n" +
# "am inside of public\n", "\n" +
# "am inside of another public\n"]
doc.search('public').each(&:remove)
text_outside_public_tags = doc.at('root').text
# => "\n" +
# "this is test line 1 \n" +
# "\n" +
# "\n"
答案 1 :(得分:1)
您可以在此处使用Ruby flip-flop operator。
<强>代码强>
def dissect(str)
arr = str.lines.map(&:strip)
grp, ungrp = [], []
arr.each { |line| line=='<public>'..line=='</public>' ? (grp << line) : ungrp << line }
[grp.slice_when { |s,t| s == '</public>' && t == '<public>' }.
map { |a| a[1..-2] },
ungrp]
end
该方法的最后一个语句构造了该方法返回的数组,可以替换为以下语句。
b = grp.count('<public>').times.with_object([]) do |_,a|
ndx = grp.index('</public>')
a << grp[1..ndx-1]
grp = grp[ndx+1..-1] if ndx < grp.size-1
end
[b, ungrp]
示例强>
str =<<-EOS
this is test line 1
this is testing purpose
<public>
am inside of public
doing lot of stuffs and printing result here
</public>
let's stick another line here
<public>
am inside of another public
doing another set of stuffs and printing here
</public>
and another line here
EOS
grouped, ungrouped = dissect(str)
#=> [
# [ ["am inside of public",
# "doing lot of stuffs and printing result here"],
# ["am inside of another public",
# "doing another set of stuffs and printing here"]
# ],
# [
# "this is test line 1",
# "this is testing purpose",
# "let's stick another line here",
# "and another line here"]
# ]
# ]
grouped
#=> [ ["am inside of public",
# "doing lot of stuffs and printing result here"],
# ["am inside of another public",
# "doing another set of stuffs and printing here"]
# ]
ungrouped
#=> ["this is test line 1",
# "this is testing purpose",
# "let's stick another line here",
# "and another line here"]
<强>解释强>
对于上面的例子,步骤如下。
arr = str.lines.map(&:strip)
#=> ["this is test line 1", "this is testing purpose", "<public>",
# "am inside of public", "doing lot of stuffs and printing result here",
# "</public>", "let's stick another line here", "<public>",
# "am inside of another public", "doing another set of stuffs and printing here",
# "</public>", "and another line here"]
ungrp, grp = [], []
arr.each { |line| line=='<public>'..line=='</public>' ? (grp << line) : ungrp << line }
触发器返回false
,直到line=='<public>'
为true
。然后它返回true
并继续返回true
,直到 line=='</public>'
之后的为true
。然后它返回false
,直到它再次遇到line=='<public>'
为true
的行,依此类推。
ungrp
#=> <returns the value of 'ungrouped' in the example>
grp
#=> ["<public>",
# "am inside of public",
# "doing lot of stuffs and printing result here",
# "</public>",
# "<public>",
# "am inside of another public",
# "doing another set of stuffs and printing here",
# "</public>"]
enum = grp.slice_when { |s,t| s == '</public>' && t == '<public>' }
#=> #<Enumerator: #<Enumerator::Generator:0x00000
见Enumerable#slice_when,它在Ruby v2.2中首次亮相。
我们可以看到这个枚举器通过将它转换为数组而生成的元素。
enum.to_a
#=> [["<public>", "am inside of public",
# "doing lot of stuffs and printing result here", "</public>"],
# ["<public>", "am inside of another public",
# "doing another set of stuffs and printing here", "</public>"]]
最后,
enum.map { |a| a[1..-2] }
#=> <returns the array 'grouped' in the example>