我有以下CSV文件:
data.csv
Chart #,Ticker,Industry,Last Price,Multiple
2,AFL,Accident & Health Insurance,60.9,0.82
3,UNM,Accident & Health Insurance,32.97,1.52
4,CNO,Accident & Health Insurance,19.33,2.59
2,OMC,Advertising Agencies,71.71,0.7
3,IPG,Advertising Agencies,21.24,2.35
4,ADS,Advertising Agencies,278.18,0.18
2,UPS,Air Delivery & Freight Services,103.8,0.48
3,FDX,Air Delivery & Freight Services,152.11,0.33
4,EXPD,Air Delivery & Freight Services,50.725,0.99
5,CHRW,Air Delivery & Freight Services,72.3,0.69
6,FWRD,Air Delivery & Freight Services,42.86,1.17
我想使用Awk或最好的linux命令行工具来使文件中的日期看起来像这样:
output.txt的
Accident & Health Insurance
2*0.82,3*1.52,4*2.59
Advertising Agencies
2*0.7,3*2.35,4*0.18
Air Delivery & Freight Services
2*0.48,3*0.33,4*0.99,5*0.69,6*1.17
我基本上把所有“图表#”&将它们乘以倍数并在一行上输出“Industry”,所有图表都用逗号分隔,然后是第三行的空格......然后它处理整个列表。
有人能指出我正确的方向如何做到这一点? Awk是否是这项任务的最佳工具,还是我必须创建一个bash脚本来处理它?
答案 0 :(得分:4)
awk -F, '{a[$3]=a[$3]?a[$3]","$1"*"$NF:$1"*"$NF}END{for(i in a)print i"\n"a[i]}' filename
Air Delivery & Freight Services
2*0.48,3*0.33,4*0.99,5*0.69,6*1.17
Advertising Agencies
2*0.7,3*2.35,4*0.18
Accident & Health Insurance
2*0.82,3*1.52,4*2.59
答案 1 :(得分:4)
$ awk -F, -v OFS='\n' -v ORS='\n\n' '
NR==1 { next }
(NR>2) && ($3!=prevKey) { print prevKey, prevRec; prevRec="" }
{ prevKey=$3; prevRec=(prevRec==""?"":prevRec",") $1"*"$NF }
END { print prevKey, prevRec }
' file
Accident & Health Insurance
2*0.82,3*1.52,4*2.59
Advertising Agencies
2*0.7,3*2.35,4*0.18
Air Delivery & Freight Services
2*0.48,3*0.33,4*0.99,5*0.69,6*1.17
以上与@A-Ray's answer之间的功能差异在于: