我有一个results.csv文件,其中包含以下布局中的名称:
name1, 2(random number)
name5, 3
和一个sample.txt,该文件的结构如下
record_seperator
name1
foo
bar
record_seperator
name2
bla
bluh
我想在sample.txt文件中的results.csv中查找每个名称,如果找到,则将记录输出到文件中。 我试图从第一个文件中生成一个数组并进行搜索,但语法不正确。 它需要在bash脚本中运行。如果有人比awk有更好的主意,那也很好,但是我对应该运行的机器没有管理员权限。 真正的csv文件包含10.000个名称和sample.txt 450万条记录。 我是awk中的血腥初学者,因此请多解释。 这是我目前的尝试,不起作用,我也不知道为什么:
#!/bin/bash
awk 'BEGIN{
while (getline < "results.csv")
{
split($0,name,",");
nameArr[k]=name[1];
}
{
RS="record_seperator"
FS="\n"
for (key in nameArr)
{
print nameArr[key]
print $2
if ($2==nameArr[key])
NR > 1
{
#extract file by Record separator and name from line2
print RS $0 > $2 ".txt"
}
}
}
}' sample.txt
编辑: 我的预期输出将是两个文件:
name1.txt
record_seperator
name1
foo
bar
name2.txt
record_seperator
name2
bla
bluh
答案 0 :(得分:1)
这里是一个。 由于没有预期的输出,因此只输出原始记录:
$ awk '
NR==FNR { # process first file
a[$1]=RS $0 # hash the whole record with first field (name) as key
next # process next record in the first file
} # after this line second file processing
$1 in a { # if first field value (name) is found in hash a
f=$1 ".txt" # generate filename
print a[$1] > f # output the whole record
close(f) # preserving fds
}' RS="record_seperator\n" sample RS="\n" FS="," results # file order and related vars
只有一场比赛:
$ cat name1.txt
record_seperator
name1
foo
bar
在gawk和mawk上进行了测试,对原始awk表现得很奇怪。
答案 1 :(得分:0)
类似的东西,(未经测试)
IN
由于记录分隔符在记录之前,因此需要将其延迟一。
使用内嵌式行/记录迭代器,而不要变通。
答案 2 :(得分:0)
(在@Tiw的带领下,我还将结果文件中的name5更改为name2,以获取预期的输出)
$ cat a.awk
# collect the result names into an array
NR == FNR {a[$1]; next}
# skip the first (empty) sample record caused by initial record separator
FNR == 1 { next }
# If found, output sample record into the appropriate file
$1 in a {
f = ($1 ".txt")
printf "record_seperator\n%s", $0 > f
}
使用gawk运行多字符RS:
$ gawk -f a.awk FS="," results.csv FS="\n" RS="record_seperator\n" sample.txt
检查结果:
$ cat name1.txt
record_seperator
name1
foo
bar
$ cat name2.txt
record_seperator
name2
bla
bluh
答案 3 :(得分:0)
您编码的错误:
#!/bin/bash
awk 'BEGIN{
while (getline < "results.csv")
{
split($0,name,",");
nameArr[k]=name[1]; ## <-- k not exists, you are rewriting nameArr[""] again and again.
}
{
RS="record_seperator"
FS="\n"
for (key in nameArr) ## <-- only one key "" exists, it's never gonna equal to $2
{
print nameArr[key]
print $2
if ($2==nameArr[key])
NR > 1
{
#extract file by Record separator and name from line2
print RS $0 > $2 ".txt"
}
}
}
}' sample.txt
您还显示了示例:
name1, 2(random number)
name5, 3 ## <-- name5 here, not name2 !
将name5
更改为name2
,并更新了自己的代码:
#!/bin/bash
awk 'BEGIN{
while ( (getline line< "results.csv") > 0 ) { # Avoid infinite loop when read erorr encountered.
split(line,name,",");
nameArr[name[1]]; # Actually no need do anything, just refer once to establish the key (name[1]).
}
RS="record_seperator";
FS="\n";
}
$2 in nameArr {
print RS $0; #You can add `> $2 ".txt"` later yourself.
}' sample.txt
输出:
record_seperator
name1
foo
bar
record_seperator
name2
bla
bluh