Question

我有以下以制表符分隔的文件：

Oslo      5
Montreal  4
Berlin    7
London    7
...

根据这些数据，我试图构造一个对称表，其中所有x全部都减去减法，从而生成如下表：

          Oslo      Montreal  Berlin    London
          --------- --------- --------- ---------
Oslo              0        -1         2         2
Montreal          1         0         3         3
Berlin           -2        -3         0         0
London           -2        -3         0         0

输出应为制表符分隔的文件。

我一直在尝试使用R和perl进行此操作，这是我的基本经验，但对于我和我来说都做不到。在Perl中，我尝试使用哈希进行减法，但是我什么也没有。我认为Python应该对此有一个好的解决方案，但我从未尝试编写python脚本，我只是在开始。我在Google中使用几种不同的关键字组合以及唯一的类似情况寻找了它，但发现的却是另一种语言： Creating a symmetric matrix

能帮我吗？将不胜感激！

PS：由于也许我的问题太浅了，所以您至少可以建议我使用什么语言（R，Perl或Python），函数，包或什至更合适的关键字来保持自我解决能力。

我尝试这样做是为了获得全部x全部减，但是我肯定在这里迷路了：

#!/usr/bin/perl
use diagnostics;
use warnings;

print "file:\t";
$arq1 = <STDIN>;
open (MYFILE, $arq1);
my %hash;
while (my $line=<MYFILE>) {
    chomp($line);
    (my $city,my $value) = split /\t/, $line;
    $hash{$city} = $value;
}

my %hash2;
while (my $line=<MYFILE>) {
    chomp($line);
    (my $city,my $value) = split /\t/, $line;
    $hash2{$city} = $value;
}

my @diff;
foreach my $key (keys %hash) {
    @diff = $hash{$key} - $hash2{$key};
}

print "difference @diff\n";

Answer 1

您正在读取文件的末尾，然后尝试从那里继续读取。第二个循环没有单次通过就终止。解决方案是完全消除第二个循环，因为创建两个相同的哈希值没有意义。

第二个问题是您仅输出一行数据。您将需要嵌套循环（行循环内的列循环）。

#!/usr/bin/perl
use strict;
use warnings;
use feature qw( say );

my @cities;
my %temps;
while (<>) {
   chomp;
   my ($city, $temp) = split /\t/;
   push @cities, $city;
   $temps{$city} = $temp;
}

say join "\t", "", @cities;

for my $city_y (@cities) {
   my @diffs;
   for my $city_x (@cities) {
      push @diffs, $temps{$city_x} - $temps{$city_y};
   }

   say join "\t", $city_y, @diffs;
}

如果您有点冒险，map会更适合内循环。

for my $city_y (@cities) {
   say join "\t", $city_y, map { $temps{$_} - $temps{$city_y} } @cities;
}

Answer 2

这是R中的一种解决方案。也许不是最干净的一种，但它是一个：

library(dplyr)
library(magrittr)

df <- data.frame(city = c("Oslo","Paris","Londres","Lima","Lyon","Memphis","Ouagadougou"),
                 pop = runif(7, min = 5000, max = 10000))

result <- data.frame(matrix(nrow = nrow(df), ncol = nrow(df)))
names(result) <- df$city
row.names(result) <- df$city

for(city in df$city) {
  tmp <- df$pop - df$pop[df$city == city]
  result[,as.character(city)] <- tmp
}

接下来的三行允许将行名转换为经典列：

result$city <- row.names(result)
row.names(result) <- 1:nrow(result)
result2 <- result %>% dplyr::select(city, everything())

Answer 3

这就是我在Perl中要做的事情。希望您能以身作则。其中有一些经典的Perl主义，使它成为这类事情的便捷语言。

#!/usr/bin/perl

use strict;
use warnings;

# Read temperature data from @ARGV files in "city<tab>temperature"
# format into a hash of city => temperature.
my %temp;
while (<>) {
    /^(.+)\t(-?\d+)\s*$/ # captures $1=city, $2=temp; enforces format
        or die "Bad data at line $.: $_";
    $temp{$1} = $2;
}

# Sort city names for rows and columns.
my @city = sort keys %temp;

# A little convenience function for printing.
sub tabulate { print join("\t", @_), "\n" }

# Print column header row.
tabulate('', @city);

# Print table body.
for my $row (@city) {
    tabulate($row, map { $temp{$_} - $temp{$row} } @city);
}

exit(0);

Answer 4

在R中，读入数据（在末尾的注释中可重复显示），从名称为国家/地区名称的国家/地区矢量创建数字国家/地区矢量，然后使用outer创建所需的矩阵。

country <- setNames(DF[[2]], DF[[1]])
-outer(country, country, "-")

给予：

         Oslo Montreal Berlin London
Oslo        0       -1      2      2
Montreal    1        0      3      3
Berlin     -2       -3      0      0
London     -2       -3      0      0

注意

Lines <- "Oslo      5
Montreal  4
Berlin    7
London    7"

# DF <- read.table("myfile")
DF <- read.table(text = Lines, as.is = TRUE, strip.white = TRUE)

Answer 5

这不会给您想要的信号，但是r中的dist()函数可以帮助您入门。

x <- c(5,4,7,7)
names <- c("Oslo", "Montreal", "Berlin", "London")
names(x) <- names
dist(x, upper=TRUE, diag = TRUE)

         Oslo Montreal Berlin London
Oslo        0        1      2      2
Montreal    1        0      3      3
Berlin      2        3      0      0
London      2        3      0      0

这里的第二个答案也在Signed distance matrix in R下面的答案中

创建填充有交叉减法的对称矩阵

5 个答案:

注意