解析和构造文本文件

时间:2016-11-15 08:01:37

标签: ruby parsing

我需要帮助而且我使用Ruby。我有一个文本文件,其中包含:

Head 1
a 10
b 14
c 15
d 16
e 17
f 88
Head 4
r 32
t 55
s 79
r 22
t 88
y 53
o 78
p 90
m 44
Head 53
y 22
b 33
Head 33
z 11
d 66
v 88
b 69
Head 32
n 88
m 89
b 88

我想解析并将此文件结构化到下一个平面。我想获得下一个数据:

Head 1, f 88
Head 4, t 88
Head 33, v 88
Head 32, n 88
Head 32, b 88

请告诉我如何在红宝石上制作这样的代码?

我想首先我把它放在数组中的所有行:

lines = Array.new
File.open('C:/file/file.txt', 'r').each { |line| lines << line }

但接下来该怎么做?

谢谢!

2 个答案:

答案 0 :(得分:1)

如果回答@mudasobwa问题“你想抓住所有有88值的东西吗?”这是解决方案

lines = File.open("file.txt").to_a
lines.map!(&:chomp) # remove line breaks

current_head = ""
res = []

lines.each do |line|
  case line
  when /Head \d+/
    current_head = line
  when /\w{1} 88/
    res << "#{current_head}, #{line}"
  end
end

puts res

答案 1 :(得分:1)

我已将您的数据写入'temp'文件:

首先定义一个正则表达式,用于提取感兴趣的文件行。

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<link href="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap.min.css" rel="stylesheet"/>
<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
<div class=" table-responsive">
    <table class="table table-bordered table-hover">
        <thead>
            <tr>
                <th>PPMID</th>
                <th>EPRID</th>
                <th>Release ID</th>
                <th>Project Name</th>
                <th>Release Name</th>
                <th>Application Name</th>
                <th>Action</th>
            </tr>
            <tr>
                <th>
                    <input class="form-control" id="ppmid" type="number" min="1" placeholder="PPMID">
                </th>
                <th>
                    <input class="form-control" id="eprid" type="number" min="1" placeholder="EPRID">
                </th>
                <th>
                    <input class="form-control" id="releaseid" type="text" placeholder="Release ID">
                </th>
                <th>
                    <input class="form-control" id="projectname" type="text" placeholder="Project Name">
                </th>
                <th>
                    <input class="form-control" id="releasename" type="text" placeholder="Release Name">
                </th>
                <th>
                    <input class="form-control" id="applicationname" type="text" placeholder="Application Name">
                </th>
                <th>

                    <button class="btn btn-primary">
                        <span class="glyphicon glyphicon-plus"></span>                      
                    </button> 
                </th>  
            </tr>
        </thead>
        <tbody>

            <tr ng-repeat="item in filteredlist |  filter:searchText"><!--false for ascending, true for descnding-->
            <td>{{item.PMID}}</td>
            <td>{{item.EPRID}}</td>
            <td>{{item.Releaseid}}</td>
            <td>{{item.projectname}}</td>
            <td>{{item.releasename}}</td>
            <td>{{item.appname}}</td>
            <td>

                <button type="button" class="btn btn-default">
                    <span class="glyphicon glyphicon-edit"></span>
                </button>
                <button type="button" class="btn btn-danger">
                    <span class="glyphicon glyphicon-trash"></span>
                </button>
            </td>
             </tr>
        </tbody>
    </table>


</div>

现在对文件执行以下操作。

r = /
    Head\s+\d+        # match 'Head', > 0 spaces, ?= 1 digits in capture group 1
    |                 # or
    [[:lower:]]+\s+88 # match > 0 lower case letters, > 0 spaces, '88'
    /xm               # free-spacing regex definition and multi-line modes

步骤如下。

File.read('temp').scan(r).
                  slice_before { |line| line.start_with?('Head ') }.
                  reject { |a| a.size == 1 }.
                  flat_map { |head, *rest| [head].product(rest) }.
                  map { |a| "%s, %s" % a }
  #=> ["Head 1, f 88", "Head 4, t 88", "Head 33, v 88",
  #    "Head 32, n 88", "Head 32, b 88"]

我们可以通过将枚举器转换为数组来查看枚举器a = File.read('temp').scan(r) #=> ["Head 1", "f 88", "Head 4", "t 88", "Head 53", "Head 33", # "v 88", "Head 32", "n 88", "b 88"] b = a.slice_before { |line| line.start_with?('Head') } #=> #<Enumerator: #<Enumerator::Generator:0x007ffd218387b0>:each> 生成的元素。

b

现在从b.to_a #=> [["Head 1", "f 88"], ["Head 4", "t 88"], ["Head 53"], # ["Head 33", "v 88"], ["Head 32", "n 88", "b 88"]] 删除所有大小为1的数组。

b

接下来,我们使用Enumerable#flat_mapArray#product将每个“Head”与结束c = b.reject { |a| a.size == 1 } #=> [["Head 1", "f 88"], ["Head 4", "t 88"], ["Head 33", "v 88"], # ["Head 32", "n 88", "b 88"]] 之后的所有行(在文件的下一个“Head”或末尾之前)相关联。

88\n

最后,将d = c.flat_map { |head, *rest| [head].product(rest) } #=> [["Head 1", "f 88"], ["Head 4", "t 88"], ["Head 33", "v 88"], # ["Head 32", "n 88"], ["Head 32", "b 88"]] 的每个元素转换为字符串。

d