我有一个以9位大学代码开头的文本文件,以5位数的课程代码结束。
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
有些条目中有换行符,如上面的3示例所示。 我需要将第3行和第4行合并为一行,就像第一行和第二行一样,这样我就可以轻松使用grep,awk等命令。
更新
凯文的回答似乎不起作用。
cat todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
cat todel.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }'
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531ege of,
答案 0 :(得分:1)
假设您的数据位于“file.txt”中,这是一个将线条重新组合在一起的扫描:
cat file.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }'
这假设所有有效记录都以9位数字开头。 “chomp”最初删除了新行,模式决定了新行应该出现在输出中的位置。
答案 1 :(得分:1)
关于分割线:这个sed
脚本假定在前导数字后面(在分割的第一行)上至少有一个空格,在尾随数字前面有一个空格(在最后一行上)分裂),每个分割线只有一个分割。
修改为接受使用Windows CRLF换行符或 * nix LF的输入。但请注意,输出是* nix \n
sed -nr 's/\r?$// # allow for '\r\n' newlines
/^([0-9]{9}) .* ([0-9]{5})$/{p;b}
/^([0-9]{9}) /{h;b}
/ ([0-9]{5})$/{x;G; s/\n//; p}'
或更短,但可能不太可读:
sed -nr 's/\r?$//; /^([0-9]{9}) /{/ ([0-9]{5})$/{p;b};h;b};/ ([0-9]{5})$/{x;G; s/\n//; p}'
我确实期望第一个更快,因为最常见的测试(对于完整行)仅涉及单个正则表达式,而第二个(较短)脚本需要两个正则表达式测试以进行最频繁的测试。
这是我得到的输出;使用GNU sed 4.2.1
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,Pune 61220 enter code hereMechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
答案 2 :(得分:1)
这可能对您有用:
sed ':a;$!N;/ [0-9]\{5\}\n[0-9]\{9\} /!s/\n//;ta;P;D' file
说明:
编辑:
测试数据:
cat <<\! >/tmp/codel.txt
> 112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
> Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
> !
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/codel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/{codel.txt,codel.txt,codel.txt}
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
答案 3 :(得分:0)
也许尝试删除逗号后出现的所有换行符,如下所示:
perl -i -pe 's/,\n/,/g' file.txt
也许你想在逗号后面留空格:
perl -i -pe 's/(,\s*)\n/$1/g' file.txt
答案 4 :(得分:0)
试试这个
sed '/^[0-9]\{9\}/{h;};/^[0-9]\{9\}/!{x;G;s/\n//g;}' test | grep -E '[0-9]{5}$'
答案 5 :(得分:0)
awk '! ($1 ~ /^[[:digit:]]/) {$0 = save " " $0} $1 ~ /^[[:digit:]]/ {save = $0} $NF ~ /[[:digit:]]$/ {print}' inputfile
答案 6 :(得分:0)
cat todel.txt |awk 'BEGIN {i=0} {first[i]=$1; lines[i++] = $0;} END {for (x=0; x<i; x++) { if ( x==(i - 1) || (first[x + 1] ~ /^[0-9]+$/ && length(first[x + 1])==9) ) {printf("%s: %s\n", x, lines[x]);} else {printf("%s: %s%s\n", x, lines[x], lines[x + 1]); x++;} } }'
答案 7 :(得分:0)
这适用于包含的数据集,假设有效记录以五位数结尾:
use Modern::Perl;
my $data = do{local $/; <DATA>};
$data =~ s/([^\d]{5})\n/$1 /sg;
say $data;
__DATA__
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
输出:
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering, Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of, Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 935315