我有两个文件,每个文件有700个字段,其中699/700个字段具有匹配的标题。我想重新排序字段,以便它们在两个文件中的顺序相同(尽管哪个顺序无关紧要)。例如,给定:
File1:
FRUIT MSMC1 MSMC24 MSMC2 MSMC10
Apple 1 2 3 2
Pear 2 1 4 5
File2:
VEG MSMC24 MSMC1 MSMC2 MSMC10
Onion 2 1 3 2
Radish 0 3 9 3
我希望两个文件都将第一个字段作为两个文件不相同的字段,然后两个文件中的其余字段按相同的顺序排列,例如一个可能的结果是:
File1:
FRUIT MSMC1 MSMC2 MSMC10 MSMC24
Apple 1 3 2 2
Pear 2 4 5 1
File2:
VEG MSMC1 MSMC2 MSMC10 MSMC24
Onion 1 3 2 2
Radish 3 9 3 0
答案 0 :(得分:1)
使用data.table
,这可以帮助您
首先阅读文件,
library(data.table)
dt1 <- fread("file1.csv")
dt2 <- fread("file2.csv")
然后,获取字段的名称,常见字段
ndt1 <- names(dt1)[-1]
ndt2 <- names(dt2)[-1]
common <- intersect(ndt1, ndt2)
现在你可以申请新订单
了 setorder(dt1, c(ndt1[1], setdiff(ndt1, common), common))
setorder(dt2, c(ndt2[1], setdiff(ndt2, common), common))
答案 1 :(得分:1)
一个perl解决方案,它将第一个文件保留为原样并写入第二个文件,其列的排列顺序与第一个文件的顺序相同。它读取命令行上提供的2个文件(遵循脚本名称)。
更新:添加了if (arguments[0].contains(":")) { // If the first argument contains colons
String[] parts = arguments[0].split(":"); // Split the string at all colon characters
int typeId; // The type ID
try {
typeId = Integer.parseInt(parts[0]); // Parse from the first string part
} catch (NumberFormatException nfe) { // If the string is not an integer
sender.sendMessage("The type ID has to be a number!"); // Tell the CommandSender
return false;
}
byte data; // The data value
try {
data = Byte.parseByte(parts[1]); // Parse from the second string part
} catch (NumberFormatException nfe) {
sender.sendMessage("The data value has to be a byte!");
return false;
}
Material material = Material.getMaterial(typeId); // Material will be null if the typeId is invalid!
// Get the block whose type ID and data value you want to change
if (material != null) {
block.setType(material);
block.setData(data); // Deprecated method
} else {
sender.sendMessage("Invalid material ID!");
}
}
短语,以允许第二个文件成为第一个文件的子集。回答他的问题如果一个文件是另一个文件的子集(文件1中的所有列都不在file2中),如何修改这些答案? - theo4786
map $_ // (),
输出是:
#!/usr/bin/perl
use strict;
use warnings;
# commandline: perl script_name.pl fruits.csv veg.csv
my (undef, @fruit_hdrs) = split ' ', <> and close ARGV;
my @veg_hdrs;
while (<>) {
my ($name, @cols) = split;
# only executes for the first line (header line) of second file
@veg_hdrs = @cols unless @veg_hdrs;
my %line;
@line{ @veg_hdrs } = @cols;
print join(" ", $name, map $_ // (), @line{ @fruit_hdrs } ), "\n";
}
答案 2 :(得分:0)
在perl中,此作业的工具是哈希切片。
您可以将哈希值视为@hash{@keys}
。
这样的事情:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @headers;
my $type;
my @rows;
#iterate data - would do this with a normal 'open'
while ( <DATA> ) {
#set headers if the leading word is all upper case
if ( m/^[A-Z]+\s/ ) {
#seperate out type (VEG/FRUIT) from the other headings.
chomp ( ( $type, @headers ) = split );
#print for debugging
print Dumper \@headers;
}
else {
#create a hash to store this row.
my %this_row;
#split the row on whitespace, capturing name and ordered fields by header row.
( my $name, @this_row{@headers} ) = split;
#insert name and type into the hash
$this_row{name} = $name;
$this_row{type} = $type;
#print for debugging
print Dumper \%this_row;
#store it in @rows
push ( @rows, \%this_row );
}
}
#print output:
#header line
print join ("\t", "name", "type", @headers ),"\n";
#iterate rows, extract ordered by _last_ set of headers.
foreach my $row ( @rows ) {
print join ( "\t", $row->{name}, $row->{type}, @{$row}{@headers} ),"\n";
}
__DATA__
FRUIT MSMC1 MSMC24 MSMC2 MSMC10
Apple 1 2 3 2
Pear 2 1 4 5
VEG MSMC24 MSMC1 MSMC2 MSMC10
Onion 2 1 3 2
Radish 0 3 9 3
注意 - 我已经使用Data::Dumper
进行诊断 - 可以删除这些行,但我已经离开了它们,因为它说明了正在发生的事情。
同样从<DATA>
读取 - 通常您打开文件句柄,或者只使用while ( <> ) {
来读取STDIN或命令行中指定的文件。
输出的顺序基于“看到”的最后一个标题行 - 您当然可以对其进行排序,或对其重新排序。
如果您需要处理不匹配的列,则会在丢失的列上出错。在这种情况下,我们可以突破map
以填充任何空白,并使用headers
的哈希来确保我们捕获所有空格。
E.g;
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my @headers;
my %headers_combined;
my $type;
my @rows;
#iterate data - would do this with a normal 'open'
while ( <DATA> ) {
#set headers if the leading word is all upper case
if ( m/^[A-Z]+\s/ ) {
#seperate out type (VEG/FRUIT) from the other headings.
chomp ( ( $type, @headers ) = split );
#add to hash of headers, to preserve uniques
$headers_combined{$_}++ for @headers;
#print for debugging
print Dumper \@headers;
}
else {
#create a hash to store this row.
my %this_row;
#split the row on whitespace, capturing name and ordered fields by header row.
( my $name, @this_row{@headers} ) = split;
#insert name and type into the hash
$this_row{name} = $name;
$this_row{type} = $type;
#print for debugging
print Dumper \%this_row;
#store it in @rows
push ( @rows, \%this_row );
}
}
#print output:
#header line
#note - extract keys from hash, not the @headers array.
#sort is needed to order them, because default is unordered.
print join ("\t", "name", "type", sort keys %headers_combined ),"\n";
#iterate rows, extract ordered by _last_ set of headers.
foreach my $row ( @rows ) {
print join ( "\t", $row->{name}, $row->{type}, map { $row->{$_} // '' } sort keys %headers_combined ),"\n";
}
__DATA__
FRUIT MSMC1 MSMC24 MSMC2 MSMC10 OTHER
Apple 1 2 3 2 x
Pear 2 1 4 5 y
VEG MSMC24 MSMC1 MSMC2 MSMC10 NOTHING
Onion 2 1 3 2 p
Radish 0 3 9 3 z
这里,map { $row->{$_} // '' } sort keys %headers_combined
获取散列的所有键,按顺序返回它们,然后从行中提取该键 - 或者如果未定义则提供空白空间。 (多数民众赞成//
做什么)
答案 3 :(得分:0)
这将重新排序file2中的字段以匹配file1中的顺序:
$ cat tst.awk
FNR==1 {
fileNr++
for (i=2;i<=NF;i++) {
name2nr[fileNr,$i] = i
nr2name[fileNr,i] = $i
}
}
fileNr==2 {
printf "%s", $1
for (i=2;i<=NF;i++) {
printf "%s%s", OFS, $(name2nr[1,nr2name[2,i]])
}
print ""
}
$ awk -f tst.awk file1 file2
VEG MSMC1 MSMC24 MSMC2 MSMC10
Onion 1 2 3 2
Radish 3 0 9 3
使用GNU awk,您可以删除fileNr++
行,并在其他地方使用ARGIND
代替fileNr
。