awk:如何将其他记录的非匹配字段附加到同一字段的当前记录中?

时间:2015-11-25 22:53:20

标签: awk

我想将其他记录中的非匹配字段附加到当前记录的字段中。

每条记录的第一个字段是一个组ID。每个人都与不在其组ID中的人匹配。需要所有可能的匹配。

例如,给定names.db

1 Nikola Tesla
1 Pierre-Simon Laplace
1 Oliver Heaviside
2 James Watson
2 Francis Crick
3 Kanye West
4 Michael Faraday
4 Lord Rayleigh

变成:

Nikola Tesla -> James Watson
Nikola Tesla -> Francis Crick
Nikola Tesla -> Kanye West
Nikola Tesla -> Michael Faraday
Nikola Tesla -> Lord Rayleigh

Pierre-Simon Laplace -> James Watson
Pierre-Simon Laplace -> Francis Crick
Pierre-Simon Laplace -> Kanye West
Pierre-Simon Laplace -> Michael Faraday
Pierre-Simon Laplace -> Lord Rayleigh

Oliver Heaviside -> James Watson
Oliver Heaviside -> Francis Crick
Oliver Heaviside -> Kanye West
Oliver Heaviside -> Michael Faraday
Oliver Heaviside -> Lord Rayleigh

James Watson -> Nikola Tesla
James Watson -> Pierre-Simon Laplace
James Watson -> Oliver Heaviside
James Watson -> Kanye West
James Watson -> Michael Faraday
James Watson -> Lord Rayleigh

Francis Crick -> Nikola Tesla
Francis Crick -> Pierre-Simon Laplace
Francis Crick -> Oliver Heaviside
Francis Crick -> Kanye West
Francis Crick -> Michael Faraday
Francis Crick -> Lord Rayleigh

Kanye West -> Pierre-Simon Laplace
Kanye West -> James Watson
Kanye West -> Oliver Heaviside
Kanye West -> Francis Crick
Kanye West -> Michael Faraday
Kanye West -> Nikola Tesla
Kanye West -> Lord Rayleigh

Michael Faraday -> Nikola Tesla
Michael Faraday -> Pierre-Simon Laplace
Michael Faraday -> Oliver Heaviside
Michael Faraday -> James Watson
Michael Faraday -> Francis Crick
Michael Faraday -> Kanye West

Lord Rayleigh -> Nikola Tesla
Lord Rayleigh -> Pierre-Simon Laplace
Lord Rayleigh -> Oliver Heaviside
Lord Rayleigh -> James Watson
Lord Rayleigh -> Francis Crick
Lord Rayleigh -> Kanye West

2 个答案:

答案 0 :(得分:1)

我知道你的意思。

试试这个:

awk '{b=$1;sub($1" ","");a[$0]=b}END{for(i in a){for(j in a){if(i!=j&&a[i]!=a[j])print i" -> "j}print ""}}' file

答案 1 :(得分:0)

非awk解决方案

$ join -t' ' -j 9 names{,} 
     | sed -r '/([1-9]).*\1/d;s/[1-9]//;s/[1-9]/-->/' 

  Nikola Tesla --> James Watson
  Nikola Tesla --> Francis Crick
  Nikola Tesla --> Kanye West
  Nikola Tesla --> Michael Faraday
  Nikola Tesla --> Lord Rayleigh
  Pierre-Simon Laplace --> James Watson
  Pierre-Simon Laplace --> Francis Crick
  Pierre-Simon Laplace --> Kanye West
  Pierre-Simon Laplace --> Michael Faraday
  Pierre-Simon Laplace --> Lord Rayleigh
  Oliver Heaviside --> James Watson
  Oliver Heaviside --> Francis Crick
  ...
  Michael Faraday --> Francis Crick
  Michael Faraday --> Kanye West
  Lord Rayleigh --> Nikola Tesla
  Lord Rayleigh --> Pierre-Simon Laplace
  Lord Rayleigh --> Oliver Heaviside
  Lord Rayleigh --> James Watson
  Lord Rayleigh --> Francis Crick
  Lord Rayleigh --> Kanye West

说明:创建交叉产品,删除匹配数字的行,删除第一个数字,用箭头替换第二个数字。当然,所有这些都可以通过awk完成,但我尝试了其他一些改变。