我有一个包含以下数据的示例文件
No|Name|sal
1|abc|4500
2|gkdjkh|554
3|fgh
cvb|678
4|tyu|789
5|ghl
tyu|5677
6|yyui
tyui
uui|780
7|tpo|567
我需要输出数据,如下所示
No|Name|sal
1|abc|4500
2|gkdjkh|554
3|fgh cvb|678
4|tyu|789
5|ghl tyu|5677
6|yyui tyui uui|780
7|tpo|567
答案 0 :(得分:0)
Perl而不是sed似乎在我的测试中工作得很好并且比sed更好:
$ perl -pe 's/^[0-9]+[|]/\0$&/g; s/\n/ /g; s/^\0/\n/g' file
No|Name|sal
1|abc|4500
2|gkdjkh|554
3|fgh cvb|678
4|tyu|789
5|ghl tyu|5677
6|yyui tyui uui|780
7|tpo|567
答案 1 :(得分:0)
awk 解决方案(基于处理输入文件的每个下一行):
rearrange_fields.awk 脚本:
void
<强> 用法 强>:
#!/bin/awk -f
BEGIN{ FS="|" }
{
if (NR == 1) {print $0} # print the first header line as is
else {
if (NF == 3) { print $0 }
else {
while ((getline nl) > 0) { # processing each next line
if (nl !~ /^[0-9]+\|/) { # if it's not a regular line (with starting order digit i.e. `1|`)
if (prepend) {
$0 = prepend" "$0 # prepend the last partial line if exists
}
$0 = $0" "nl; # append to previous line
gsub(/[[:space:]]+/," ",$0) # remove redundant spaces
}
else {
if (nl !~ /.+\|.+\|.+/) { # if a loop ends up with line which starts with order number
# but hasn't enough fields
prepend = nl
print $0
}
else {
prepend = ""
print $0 RS nl # next line is a regular valid line
}
break
}
}
}
}
}
输出:
awk -f rearrange_fields.awk yourfile
答案 2 :(得分:0)
仅使用ggek解决方案,使用RT
的正则表达式和内置gawk的{2}
。 (对于不同数量的字段,将$ gawk -v RS="[^|]+([|][^|]+){2}\n" '{ gsub("\n", " ", RT); print RT}' f
No|Name|sal
1|abc|4500
2|gkdjkh|554
3|fgh cvb|678
4|tyu|789
5|ghl tyu|5677
6|yyui tyui uui|780
7|tpo|567
更改为比字段数少一个。)
{{1}}
答案 3 :(得分:0)
awk
适用于此问题,但我找到了sed
和grep
的解决方案。
困难的部分是如何处理没有|
分隔符的行。你可以使用前一行连接这些行(\ d008和\ r是字符不在输入中)
sed 's/^[^|]*$/\d008&\d008/' inputfile | tr '\n' '\r' |
sed -r "s/\r\d008([^\d008]*)\d008/\1/g" |
tr '\r' '\n'
现在我们可以将所有行连接到一个行字符串(用下一个grep
所需的标记替换\ n),并获得所需的子字符串。使用-P作为特殊字符\r
。
sed 's/^[^|]*$/\d008&\d008/' inputfile | tr '\n' '\r' |
sed -r "s/\r\d008([^\d008]*)\d008/\1/g" |
grep -Po "([^|]*\|){2}[^|\r]*" |
tr -d '\r'
以上解决方案对于OP来说太慢了(也很复杂),但比使用while-loop
要快得多:
while IFS= read -r line; do
# process header, determine nr of pipes
if [ -z "${slashes}" ]; then
slashes=${line//[^|]}
n_slashes=${#slashes}
printf "%s\n" "${line}"
lastslashes=0
continue
fi
# You have to print previous line when you have the required fields
# and the next line has new fields
new_slashes=${line//[^|]}
n_new_slashes=${#new_slashes}
if (( ${n_new_slashes} + ${lastslashes} > ${n_slashes} )); then
printf "%s\n" "${last}"
last="${line}"
lastslashes=${n_new_slashes}
else
# Append new line to last one
last="${last}${line}"
((lastslashes+=n_new_slashes))
fi
done < inputfile
echo "${last}"
通过上述原型,您可以获得awk
解决方案的灵感。
awk -F '|' 'NR==1 {
nfields=NF;
lastfields=0;
print
next
}
NF+lastfields-1 > nfields { print last;last=$0; lastfields=NF; next }
{lastfields+=NF-1} # Concat two fields, so substract 1
{last=last $0}
END {print last}
' inputfile