从csv文件中获取具有重复日期的记录

时间:2015-07-04 21:40:20

标签: ruby csv

我有一个csv文件,其中包含以下格式的记录:

<div id="container">
    <style>
        #graph-container {
            top: 0;
            bottom: 0;
            left: 0;
            right: 0;
            position: absolute;
        }
    </style>
    <div id="graph-container"></div>
</div>
<script src="../plugins/sigma.layout.forceAtlas2/worker.js"></script>
<script src="../plugins/sigma.layout.forceAtlas2/supervisor.js"></script>
<script src="../plugins/sigma.plugins.animate/sigma.plugins.animate.js"></script>
<script src="../plugins/sigma.layout.fruchtermanReingold/sigma.layout.fruchtermanReingold.js"></script>
<script type="application/javascript">

    sigma.neo4j.cypher(
            { url: 'http://localhost:7474', user: 'neo4j', password: 'admin' },
            'MATCH (n) OPTIONAL MATCH (n)-[r]->(m) RETURN n,r,m LIMIT 100',
            { container: 'graph-container' } ,
            function(s) {
                console.log('Number of nodes :'+ s.graph.nodes().length);
                console.log('Number of edges :'+ s.graph.edges().length);
            }
    );

    // Configure the Fruchterman-Reingold algorithm:
    var frListener = sigma.layouts.fruchtermanReingold.configure(s, {
      maxIterations: 500,
      easing: 'quadraticInOut',
      duration: 800
    });

    // Bind the events:
    frListener.bind('start stop interpolate', function(e) {
      console.log(e.type);
    });

    // Start the Fruchterman-Reingold algorithm:
    sigma.layouts.fruchtermanReingold.start(s);


</script>

我想只提取日期不唯一的记录。我尝试了下面看到的嵌套循环,但我得到了所有记录。

这是输出的样子:

HERMES; 1981-04-11

到目前为止我的代码:

HERMES 1981-04-11
HERMES 1981-04-11
HERMES 1981-04-11
MARCIO 1954-03-04
MARCIO 1954-03-04
LILIAN 1970-04-19
KLEBER 1967-12-14
RAIMUNDO 1981-04-11
RAIMUNDO 1981-04-11
RAIMUNDO 1981-04-11
FRANCISCO 1924-03-28
RUI 0002-11-30
MARIA 1954-03-04
MARIA 1954-03-04
MANOEL 1968-03-24
JOANNA 1981-04-11
JOANNA 1981-04-11
JOANNA 1981-04-11

1 个答案:

答案 0 :(得分:1)

如果我理解正确,您想要打印具有重复日期的行:

require 'csv'

all_rows = CSV.read('test.csv', col_sep: ';')

all_rows.select do |row|
  all_rows.count { |srow| row[1].strip == srow[1].strip } > 1
end.each { |row| puts "#{row[0]} #{row[1]}" }

<小时/> 在这里,我们获取所有行并将它们分隔为;。之后,我们只选择可以多次找到第二部分的行,最后我们打印选定的行。