我有一个像这样的R汇总表:
employee salary startdate
John Doe :1 Min. :21000 Min. :2007-03-14
Jolie Hope:1 1st Qu.:22200 1st Qu.:2007-09-18
Peter Gynn:1 Median :23400 Median :2008-03-25
Mean :23733 Mean :2008-10-02
3rd Qu.:25100 3rd Qu.:2009-07-13
Max. :26800 Max. :2010-11-01
我需要生成一个输出csv文件,如下所示:
employee,,salary,,startdate,,
John Doe,1,Min.,21000,Min.,2007-03-14
Jolie Hope,1,1st Qu.,22200,1st Qu.,2007-09-18
Peter Gynn,1,Median,23400,Median,2008-03-25
,,Mean,23733,Mean,2008-10-02
,,3rd Qu.,25100,3rd Qu.,2009-07-13
,,Max.,26800,Max.,2010-11-01
所以在excel中它看起来像这样:
但是,将字段拆分一个或多个空格
是不够的 awk -F "[ ]+" '{ print $3 }'
它适用于标题,但不适用于其余行:
salary
Doe
Hope:1
Gynn:1
:23733
Qu.:25100
:26800
使用awk(也许是sed)可以解决这个问题吗?
答案 0 :(得分:3)
sed '1 {
s/^[[:space:]]*\([^[:space:]]\{1,\}\)[[:space:]]\{1,\}\([^[:space:]]\{1,\}\)[[:space:]]\{1,\}[[:space:]]\{1,\}\([^[:space:]]\{1,\}\)/\1,,\2,,\3,/
b
}
s/[[:space:]]\{1,\}:/:/g
/^[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\(.[^[:space:]]*\)/ {
s//\1,\2,\3,\4,\5,\6/
b
}
/^[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)[[:space:]]*\([^:]\{1,\}\):\([^[:space:]]*\)/ {
s//,,\1,\2,\3,\4/
b
}
' YourFile
如果你需要在这个ArachnoRegEx中适应一点,那么只需要一个乐趣
awk在这种情况下更有趣,主要是为了稍后添加的任何改编,但如果你只能访问sed ......
答案 1 :(得分:1)
这对于FIELDWIDTHS等使用GNU awk,并且在标题始终填充所有字段之后依赖于第一行输入。它包含仅:
s作为输出字段的位置,如果您想使用此解决方案,我希望您可以弄清楚如何跳过这些位置:
$ cat tst.awk
BEGIN { OFS="," }
NR==1 {
for (i=1;i<=NF;i++) {
printf "%s%s", $i, (i<NF?OFS OFS OFS:ORS)
}
next
}
NR==2 {
tail = $0
while ( match(tail,/([^:]+):(\S+(\s+|$))/,a) ) {
FIELDWIDTHS = FIELDWIDTHS length(a[1]) " 1 " length(a[2]) " "
tail = substr(tail,RSTART+RLENGTH)
}
$0 = $0
}
{
for (i=1;i<=NF;i++) {
gsub(/^\s+|\s+$/,"",$i)
}
print
}
$ awk -f tst.awk file
employee,,,salary,,,startdate
John Doe,:,1,Min.,:,21000,Min.,:,2007-03-14
Jolie Hope,:,1,1st Qu.,:,22200,1st Qu.,:,2007-09-18
Peter Gynn,:,1,Median,:,23400,Median,:,2008-03-25
,,,Mean,:,23733,Mean,:,2008-10-02
,,,3rd Qu.,:,25100,3rd Qu.,:,2009-07-13
,,,Max.,:,26800,Max.,:,2010-11-01