我正在寻找一种从匹配正则表达式模式的字符串中删除特定字符的方法。我将带有换行符的文本存储在一个以制表符分隔的文件中,每行应该有一个记录,我试图用空格替换所有换行符。最后一个列(这是一个带有字母数字键的短列)中不会出现换行符。
解决问题的方法恕我直言是在以下模式中替换\n
的每个实例:
[^\t]*\t[^\t]*
我的解决方案到目前为止使用了三个步骤:
\n
替换“好”s/\([^\t]*\t{x}[^\t]*\)\n/\1#12398754987235649876234#/g
,其中包含x
与文本其余部分不存在的特殊字符串(例如,长号)\n
比预期的列数少1在我的文件中sed
但我有相当几千兆字节的文本文件,而我正在寻找一种方法,可以在单 foo \t Each multiplex has screens allocated \n
to each studio. \t abc \n
bar \t The screens need filling. \t bcd \n
123 \t Studios have to create product to fill \n
their screen, and the amount of good product is limited. \t cde \n
步骤中执行此操作。
示例输入:
foo \t Each multiplex has screens allocated to each studio. \t abc \n
bar \t The screens need filling. \t bcd \n
123 \t Studios have to create product to fill their screen, and the amount of good product is limited. \t cde \n
输出:
{{1}}
答案 0 :(得分:1)
使用awk
cat file
foo Each multiplex has screens allocated
to each studio.
bar The screens need filling.
123 Studios have to create product to fill
their screen, and the amount of good product is limited.
如果某行包含标签\t
,则将其连接到下一行。
awk 'NR>1 {s=/\t/?"\n":" "}{printf s"%s",$0} END {print ""}'
foo Each multiplex has screens allocated to each studio.
bar The screens need filling.
123 Studios have to create product to fill their screen, and the amount of good product is limited.
答案 1 :(得分:1)
这可能适合你(GNU sed):
sed -r ':a;$!N;s/\n([^\t]+)$/\1/;ta;P;D' file
在模式空间(PS)中读取2行,如果最后一行不包含制表符,请删除换行符并读入下一行并重复。 如果该行确实包含选项卡,则打印第一行然后将其删除,然后重复。
答案 2 :(得分:0)
使用sed来处理前面的行总是很棘手,因为它有少量缓冲区,非贪婪量词,缺乏预测等等的局限性,但是这里有一个方法。它被评论但我知道这不容易理解
sed -n '
## Label "a"
:a;
## Enter this section after join all lines without a tab.
/\t.*\t/ {
## Loop to remove all newlines but the last one, because it is
## next line with a tab that I dont want to print now.
:b;
/\n[^\n]*\n/ {
s/\n/ /;
bb
};
## Print until newline (all joined lines) and delete them
P;
D;
};
## Append next line to buffer and repeat loop.
N;
$! ba;
## Special case for last line, remove extra newlines and print.
s/\n/ /g;
p
' infile
假设infile
包含以下内容:
foo Each multiplex has screens allocated
to each studio.
bar The screens need filling.
123 Studios have to create product to fill
their screen, and the amount of good product is limited.
它产生:
foo Each multiplex has screens allocated to each studio.
bar The screens need filling.
123 Studios have to create product to fill their screen, and the amount of good product is limited.