我有一组CSV文件。对于我需要的每个文件:
文件示例(values1.csv
):
Item, avg, max
TT, 3, 5
DD, 3, 6
ZZ, 6, 8
UU, 3, 3
JJ, 1, 5
预定义订单(order.csv
)的示例。我需要avg
以及max
的一些内容:
DD_avg
ZZ_avg
ZZ_max
TT_avg
TT_max
UU_avg
JJ_avg
输出:
file_name, DD_avg, ZZ_avg, ZZ_max, TT_avg, TT_max, UU_avg, JJ_avg
values1.csv, 3, 6, 8, 3, 5, 3, 1
values2.csv, ...................
values3.csv, ...................
这是否可以使用AWK(或任何其他Linux命令)?我的AWK技能非常有限,我不知道如何处理这个案例。我将在此感谢一些帮助和指导。
修改:真实数据
cat values1.csv
item,avg,max
System/CPU/User/percent,4.8,
System/Memory/Used/bytes,57300000000,
System/Filesystem/^data/Used/bytes,859000000,
System/Disk/disk/Reads/count/sec,37.8,730
System/Disk/disk/Writes/Utilization/percent,7.24,
System/Disk/disk/Reads/bytes/sec,849000,42100000
System/Disk/disk/Writes,0.0026,
System/Disk/disk/Writes/bytes/sec,520000,33500000
System/Disk/disk/Writes/count/sec,46.2,903
System/Disk/disk/Utilization/percent,22.4,
System/Disk/disk/Reads/Utilization/percent,15.2,
Cat order.csv
System/CPU/User/percent_avg
System/Memory/Used/bytes_avg
System/Filesystem/^data/Used/bytes_avg
System/Disk/disk/Reads/count/sec_avg
System/Disk/disk/Writes/count/sec_avg
System/Disk/disk/Reads/count/sec_max
System/Disk/disk/Writes/count/sec_max
System/Disk/disk/Reads/bytes/sec_avg
System/Disk/disk/Writes/bytes/sec_avg
System/Disk/disk/Writes/Utilization/percent_avg
System/Disk/disk/Reads/Utilization/percent_avg
答案 0 :(得分:3)
使用GNU awk for ARGIND:
$ cat tst.awk
BEGIN { FS=", *"; OFS=", " }
NR==FNR {
colNames[++numCols] = $0
next
}
{
val[ARGIND,$1"_avg"] = $2
val[ARGIND,$1"_max"] = $3
}
END {
printf "file_name"
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", OFS, colNames[colNr]
}
print ""
for (fileNr=2; fileNr<=ARGIND; fileNr++) {
printf "%s", ARGV[fileNr]
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", OFS, val[fileNr,colNames[colNr]]
}
print ""
}
}
$ gawk -f tst.awk order.csv values1.csv
file_name, DD_avg, ZZ_avg, ZZ_max, TT_avg, TT_max, UU_avg, JJ_avg
values1.csv, 3, 6, 8, 3, 5, 3, 1
使用其他代码只需在FNR==1{++ARGIND}
行后面添加BEGIN
行。如果内存是一个问题,你可以使用更少的gawks ENDFILE语句而不是END,还有其他选项 - 让我们知道这是否是一个问题。
答案 1 :(得分:2)
akshay@db-3325:/tmp$ cat order
DD_avg
ZZ_avg
ZZ_max
TT_avg
TT_max
UU_avg
JJ_avg
akshay@db-3325:/tmp$ cat values
Item, avg, max
TT, 3, 5
DD, 3, 6
ZZ, 6, 8
UU, 3, 3
JJ, 1, 5
akshay@db-3325:/tmp$ cat values1
Item, avg, max
TT, 1, 3
DD, 2, 4
akshay@db-3325:/tmp$ awk 'BEGIN{FS=OFS=","}FNR==NR{o[oh[FNR]=$1];next}function p(){s="";for(i=1; i in oh; i++){ if(!hp){ hr=(hr?hr OFS:"") oh[i] } s = (s ? s OFS:"")o[oh[i]]; o[oh[i]]="" } if(!hp){print "filename",hr; hp=1} print pf,s}k && FNR==1{p()}{gsub(/ /,""); for(i=2; i<=NF; i++){if(FNR==1){ h[i]=$i }else{ k = $1"_"h[i]; if(k in o)o[k]=$i } } pf=FILENAME }END{p()}' order values values1
filename,DD_avg,ZZ_avg,ZZ_max,TT_avg,TT_max,UU_avg,JJ_avg
values,3,6,8,3,5,3,1
values1,2,,,1,3,,
更好的可读性
awk '
BEGIN{
FS=OFS=","
}
FNR==NR{
o[oh[FNR]=$1];
next
}
function p(){
s="";
for(i=1; i in oh; i++){
if(!hp){hr=(hr?hr OFS:"") oh[i]}
s = (s ? s OFS:"")o[oh[i]];
o[oh[i]]=""
}
if(!hp){ print "filename",hr; hp=1}
print pf,s
}
k && FNR==1{ p() }
{
gsub(/ /,"");
for(i=2; i<=NF; i++)
{
if(FNR==1){
h[i]=$i
}
else{
k = $1"_"h[i];
if(k in o)o[k]=$i
}
}
pf=FILENAME
}
END{
p()
}
' order values values1
答案 2 :(得分:2)
awk
救援!
awk -F_ -v OFS=', ' '
NR==FNR {h[++c]=$1; t[c]=$2; next}
FNR==1 {if(!data) {
printf "%s", "file_name";
for(i=1;i<=c;i++) printf "%s", OFS h[i]"_"t[i];
print ""}
else pr()}
FNR>1 {avg[$1]=$2; max[$1]=$3; data=1}
END {pr()}
function pr() {
printf "%s", FILENAME;
for(i=1;i<=c;i++) printf "%s", OFS (t[i]=="avg"?avg[h[i]]:max[h[i]])
print ""}' order.csv FS=', *' values1.csv
file_name, DD_avg, ZZ_avg, ZZ_max, TT_avg, TT_max, UU_avg, JJ_avg
values1.csv, 3, 6, 8, 3, 5, 3, 1
在values1.csv
答案 3 :(得分:1)
这看起来像是Python的工作。至少如果你想正确解析CSV (带引号字段,多行字段,包含逗号的字段等),优雅地处理缺少的列,以支持每个文件的列数可变,每个文件中列的顺序 ,每个文件的列的不同的子集等。
这是一个Python 2/3脚本,它读取列选择和顺序,从作为脚本的第一个参数提供的第一个文件开始,然后从剩余的参数中“值文件”。选定的行和列(按顺序)将打印到标准输出(因此您可以将它们重定向到文件)。为了更好地处理奇怪的字段值(行多行),您需要使用 var myNumber = 123;
var sponsorNumber = [345, 234, 525];
angular.forEach(sponsorNumber, function(value) {
if (value !== myNumber) {
console.log('doesnt match!');
}
});
代替。
csv.writer
用法:
#!/usr/bin/python
import sys
import csv
from collections import defaultdict
with open(sys.argv[1], 'r') as csvfile:
# AA_avg, BB_max lines -> [['AA', 'avg'], ['BB', 'max]]
order = list(csv.reader(csvfile, delimiter='_'))
# output header
print(','.join(["file_name"] + ["{}_{}".format(*o) for o in order]))
for filename in sys.argv[2:]:
with open(filename, 'r') as csvfile:
# read all values in a 2D associative map
reader = csv.DictReader(csvfile, skipinitialspace=True)
values = defaultdict(dict)
for row in reader:
item = row[reader.fieldnames[0]]
for field in reader.fieldnames[1:]:
values[item][field] = row[field]
# select and print only the ones from order list
line = [filename] + [values[item].get(field,'N/A') for item,field in order if item in values]
print(','.join(line))