Question

我有2个文件，参考文件和parse.txt

refer.txt包含以下内容

julie,remo,rob,whitney,james

parse.txt包含

remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0

现在我的output.txt应该根据在refer.txt中指定的顺序列出parse.txt中的文件

output.txt的ex应该是：

julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0

我尝试过以下代码：

sort -nru refer.txt parse.txt

但没有运气。

请帮助我.TIA

Answer 1

你可以使用gnu-awk：

来做到这一点

awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
              {s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt

<强>输出：

julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0

说明：

-F/                          # Use field separator as /
-v RS=',|\n'                 # Use record separator as comma or newline
NR == FNR {                  # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the 
                             # records with keys julie, remo, rob etc.
}
{                            # while processing the second file refer.txt
  s = (s)?s "," a[$1]:a[$1]  # aggregate all values by reading key from 2nd file
}
END {print s }               # print all the values

Answer 2

纯粹的原生bash（4.x）：

# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt

# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[@]}"; do
  key=${value%%/*}
  if [[ ${kv[$key]} ]]; then
    kv[$key]+=",$value" # already exists, comma-separate
  else
    kv[$key]="$value"
  fi
done

# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[@]}"; do
  out+=( "${kv[$value]}" )
done

# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt

如果您获得的输出字段多于输入字段，则可能尝试使用bash 3.x运行此输出字段。由于关联数组支持对于正确操作是必需的，因此无法工作。

Answer 3

<强>命令

while read line; do
  grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -

关键点

对于这两个文件，使用tr命令将换行符转换为逗号（不实际更改文件本身）。这很有用，因为while read和grep假设你的记录是用换行符而不是逗号分隔的。
while read将从refer.txt中读取每个名称（即julie，remo等），然后使用grep从包含该名称的parse.txt中检索行。
正则表达式中的^确保匹配仅从字符串的开头执行而不是在中间执行（感谢@ CharlesDuffy在下面的注释），以及-w grep选项只允许全字匹配。例如，这确保“抢劫”仅匹配“抢劫/ ......”而不是“抢劫/ ......”或“悸动/ ......”。
最后的paste命令将以逗号分隔结果。删除此命令将在各自的行上打印每个结果。

Answer 4

tr , "\n" refer.txt | cat -n >person_id.txt  # 'cut -n' not posix, use sed and paste

cat person_id.txt | while read person_id person_key
do 
    print "$person_id" > $person_key
done

tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt

cat person_data.txt | while read foreign_key person_data
do 
    person_id="$(<$foreign_key)"
    print "$person_id" " " "$person_data" >>merge.txt
done

sort merge.txt >output.txt

教科书数据处理方法，人员ID表，人员数据表，合并在公共密钥字段上，这是该人的第一个名字：

[person_key] [person_id]
- 人员身份证表，一个独特的可排序的ID＆＃39;对于每个人（在这种情况下的行号，因为这是所需的排序顺序），以及每个人的密钥（他们的名字）

[person_key] [person_data]
- 人员数据表，每个人的数据由＆＃39; person_key＆＃39;

索引

[person_id] [person_data]
- ＆＃39; person_id＆＃39;的合并表和＆＃39; person_data＆＃39;表格＆＃39; person_key＆＃39;，然后可以在person_id上排序，按要求提供输出

诀窍是使用文件实现关联数组，文件名是密钥（在本例中为＃person; person_key＆＃39;），内容为值。 [基本上是使用文件系统实现的随机访问文件。]

这实际上为使用parse.txt中的每个值轻打refer.txt的其他简单但不是非常有效的任务添加了一个步骤 - 这更有效我不确定。

注意：上述代码不太可能开箱即用。

NBB：在反思时，可能更好的方法是使用文件系统创建parse.txt（本质上是索引）的随机访问文件，然后将refer.txt视为批处理文件，将其作为工作提交，从parse.txt随机访问文件打印出依次从refer.txt读入的每个名称的数据：

# 1) index data file on required field
cat person_data.txt | while read data
do
    key="$(print "$data" | sed 's/(^[^\/]*)/\1/')"  # alt. `cut -d'/' -f1` ??
    print "$data" >>./person_data/"$key"
done

# 2) run batch job
cat refer_data.txt | while read key
do
    print ./person_data/"$key"
done

然而，尽管如此，使用egrep可能只是一个严格的解决方案，或者至少对于小型数据集，我肯定会在提出具体问题时使用这种方法。（或者可能不是！以上可以证明更快，也更强大。）

想要根据unix shell

4 个答案:

说明：