你好说我有这个文件file1.csv,它有2列a和b,它们都是22个char字符串。它看起来像这样:
hWcYwgRKOD77hfm1oKE0IA,5HleiJXMsFkGEsr8Jqr3Ug
hWcYwgRKOD77hfm1oKE0IA,rCDlYd2WHJuiT05sYGxaVA
65q0c2Iw03B8eSuHHTETHw,G40NUD0/op+13yjzBw+hrw
65q0c2Iw03B8eSuHHTETHw,1u8UW/cQ4i1vbSF9wvzu3w
...
我想将a,b列转换为连续的整数,如:
1,1
1,2
2,3
2,4
有谁知道我该怎么办?我顺便使用Ubuntu 12.04
如果我有另一个文件file2.csv,列a'和b'。有没有办法对file2做同样的事情,如果" hWcYwgRKOD77hfm1oKE0IA"在file1中是1然后" hWcYwgRKOD77hfm1oKE0IA"如果出现,则在file2中为1。列b和b'相同。我想从这两个文件中获得单个输出:result1.csv和result2.csv
答案 0 :(得分:2)
awk -F, -v OFS=, '{ if ($1 in a) { $1 = a[$1] } else { $1 = a[$1] = ++x }
if ($2 in b) { $2 = b[$2] } else { $2 = b[$2] = ++y } } 1' file
或者可能更简单但效率可能更低:
awk -F, -v OFS=, '!($1 in a) { a[$1] = ++x } { $1 = a[$1] }
!($2 in b) { b[$2] = ++y } { $2 = b[$2] } 1' file
或动态到任意数量的列:
awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i)
if ((i, $i) in a) { $i = a[i, $i] }
else { $i = a[i, $i] = ++x[i] } } 1' file
这也类似于
awk -F, -v OFS=, '{ for (i = 1; i <= NF; ++i) {
if (!((i, $i) in a)) a[i, $i] = ++x[i]
$i = a[i, $i] } } 1' file
输出:
1,1
1,2
2,3
2,4
要应用于两个文件,请尝试:
awk -F, -v OFS=, '{ if ($1 in a) { $1 = a[$1] } else { $1 = a[$1] = ++x }
if ($2 in b) { $2 = b[$2] } else { $2 = b[$2] = ++y }
print > "result_" FILENAME }' file1 file2
awk -F, -v OFS=, '!($1 in a) { a[$1] = ++x } !($2 in b) { b[$2] = ++y }
{ print $1, $2, a[$1], b[$2] }' file
输出:
hWcYwgRKOD77hfm1oKE0IA,5HleiJXMsFkGEsr8Jqr3Ug,1,1
hWcYwgRKOD77hfm1oKE0IA,rCDlYd2WHJuiT05sYGxaVA,1,2
65q0c2Iw03B8eSuHHTETHw,G40NUD0/op+13yjzBw+hrw,2,3
65q0c2Iw03B8eSuHHTETHw,1u8UW/cQ4i1vbSF9wvzu3w,2,4
按文件版本:
awk -F, -v OFS=, '!($1 in a) { a[$1] = ++x } !($2 in b) { b[$2] = ++y }
{ print $1, $2, a[$1], b[$2] > "result_" FILENAME }' file1 file2