Question

我需要一点帮助。我有这种数据文件：

0 0    # <--- Group 1 -- 1 house (0) and 1 room (0)

0 0    # <--- Group 2 -- 2 houses (0;1) and 3,2 rooms (0,1,2;0,1)
0 1
0 2    
1 0    # <--- house 2 in Group 2, with the first room (0)
1 1    # <--- house 2 in Group 2, with the second room (1)

0 0    # <--- Group 3
0 1    # <--- house 1 in Group 3, with the second room (1)
0 2

0 0    # <--- Group 4
1 0    # <--- house 2 in Group 4, with one room only (0)
2 0
3 0    # <--- house 4 in Group 4, with one room only (0)

0 0    # <--- Group 5

0 0    # <--- Group 6

有些情况需要回答：

示例中有组;存在一个组，如果它被另一个新线分开，那么在这种情况下我们有6个组。我们必须确定以下内容

获取组的实际数字（序数）（计数器例如从1开始）

如果第1列= 0且第2列= 0且下一行为空因此，根据上述示例的期望输出将是

1
5
6
如果第一列= 0且第二列可以变化且下一行为空因此，根据上述示例的期望输出将是

3
...等。如何以一种我们可以在开始时设定的方式推广这种方式？根据组中列的值，可能存在许多情况。

如果我们考虑这样的事情，我们可以想象：第一列是指街道上的房屋数量，第二列是指房屋内的房间数量。现在我想找到一个城市中所有可能的街道，例如

让我们去那些街道，其中有两个房间数量不同的房子，第一个房子有3个房间，第二个房子有2个房间。所以我们得到输出 2 ，因为此要求在文件中满足了这个组

重要提示：0 0表示有一个房子有一个房间

更正：如果只有一个房子，那么它一直只有一个房间！与第1组，第5组和第6组的情况类似。请记住，第二列是房间数，0表示“1房间”，1表示“2房间”，...等。这只是一个从0开始的计数器，而不是1，对不起，如果它有点混乱......

Answer 1

我不知道您的预期输出是多少，但我已将您的数字模式转换/解码为有意义的组/房屋/房间格式。可以对此内容进行任何进一步的“查询”。

见下文：

kent$  cat file
0 0

0 0
0 1
0 2
1 0
1 1

0 0
0 1
0 2

0 0
1 0
2 0
3 0

0 0

0 0

AWK：

kent$  awk 'BEGIN{RS=""} 
        { print "\ngroup "++g; 
        delete a;
        for(i=1;i<=NF;i++) if(i%2) a[$i]++;
        for(x in a) printf "House#: %s , Room(s): %s \n", x, a[x]; }' file

我们得到输出：

group 1
House#: 0 , Room(s): 1 

group 2
House#: 0 , Room(s): 3 
House#: 1 , Room(s): 2 

group 3
House#: 0 , Room(s): 3 

group 4
House#: 0 , Room(s): 1 
House#: 1 , Room(s): 1 
House#: 2 , Room(s): 1 
House#: 3 , Room(s): 1 

group 5
House#: 0 , Room(s): 1 

group 6
House#: 0 , Room(s): 1

注意生成的格式可以更改为适合您的“过滤器”或“查询”

<强>更新

OP的评论：

我需要知道，例如，拥有/拥有的团体的数量 1间带一个房间的房子。输出将在上述情况下：1,5,6

正如我所说，根据您的查询条件，我们可以调整下一步的awk输出。现在我将awk更改为：

awk 'BEGIN{RS=""} 
        {print "";  gid=++g; 
        delete a;
        for(i=1;i<=NF;i++) if(i%2) a[$i]++;
        for(x in a) printf "%s %s %s\n", gid,x, a[x]; }' file

这将输出：

格式为g roupIdx houseIdx numberOfRooms，组之间有一个空行。我们将上面的文字保存到名为 decoding.txt

的文件中

所以您的查询可以在此文本上完成：

kent$  awk 'BEGIN{RS="\n\n"}{if (NF==3 && $3==1)print $1}' decoded.txt
1
5
6

上面的最后一个awk行表示打印组号，如果房号（$ 3）= 1且组块中只有一行。

Answer 2

我首先要定义一个House类和一个Group类：

class House:
    def __init__(self, rooms):
        self.rooms = rooms


class Group:
    def __init__(self, index, houses):
        self.index = index
        # houses.values() is a list with number of rooms for each house.
        self.houses = [House(houses[house_nr]) for house_nr in sorted(houses)]

    def __str__(self):
        return 'Group {}'.format(self.index)

    def __repr__(self):
        return 'Group {}'.format(self.index)

然后将数据解析为此分层结构：

with open('in.txt') as f:             
    groups = []

    # Variable to accumulate current group.
    group = collections.defaultdict(int)

    i = 1
    for line in f:
        if not line.strip():
            # Empty line found, create a new group.
            groups.append(Group(i, group))
            # Reset accumulator.
            group = collections.defaultdict(int)
            i += 1
            continue

        house_nr, room_nr = line.split()
        group[house_nr] += 1
    # Create the last group at EOF
    groups.append(Group(i, group))

然后你可以做这样的事情：

found = filter(
    lambda g:
        len(g.houses) == 1 and # Group contains one house
        g.houses[0].rooms == 1, # First house contains one room
    groups)
print(list(found)) # Prints [Group 1, Group 5, Group 6]

found = filter(
    lambda g:
        len(g.houses) == 2 and # Group contains two houses
        g.houses[0].rooms == 3 and # First house contains three rooms
        g.houses[1].rooms == 2, # Second house contains two rooms
    groups)
print(list(found)) # Prints [Group 2]

Answer 3

Perl解决方案。它将输入转换为以下格式：

第一列是组号，第二列是所有房屋的房间数（减1），已排序。要搜索包含2个和3个房间的两个不同房屋的群组，您只需grep '|1 2$'，即可搜索只有一个房屋的群组，grep '|0$'

#!/usr/bin/perl
#-*- cperl -*-

#use Data::Dumper;

use warnings;
use strict;

sub report {
    print join ' ', sort {$a <=> $b} @_;
    print "\n";
}

my $group = 1;
my @last = (0);
print '1|';
my @houses = ();
while (<>) {
    if (/^$/) { # group end
        report(@houses, $last[1]);
        undef @houses;
        print ++$group, '|';
        @last = (0);
    } else {
        my @tuple = split;
        if ($tuple[0] != $last[0]) { # new house
            push @houses, $last[1];
        }
        @last = @tuple;
    }
}

report(@houses, $last[1]);

这是基于这样一个事实：对于每个房子，只有最后一行是重要的。

模式解码

3 个答案: