Question

我有大文本文件，其中有时通过编写=然后编写newline字符将长行分成多行。（来自Kaggle的安然电子邮件数据）。因为即使是单词也是这样打破的，我想用数据进行一些机器学习，我想删除那些中断。据我所知，组合=\n仅用于这些中断，所以如果我删除它们，我会得到相同的信息，没有中断，没有任何东西丢失。

我无法使用tr因为它只替换了1个字符，但我有两个要替换的字符。
到目前为止我使用的sed命令无效：

sed --in-place --quiet --regexp-extended 's/=\n//g' email_aa_edit

其中email_aa_edit是安然邮件数据的一部分（用于拆分它），是我的输入文件。但是这只会产生一个空文件，我不知道为什么。 Afaik =本身不是特殊字符，换行符应为\n。

删除=\n次出现的正确方法是什么？

Answer 1

你不能删除换行符，因为sed逐行工作，但是如果你将下一行附加到模式空间就可以了：

getClass().getClassLoader().getResource("login/Login.fxml")

细节：

sed ':a;/=$/{N;s/=\n//;ta}' file

注意：如果您的文件使用Windows换行符序列，请将:a; # defines a label "a" /=$/ { # if the line ends with = N; # append the next line to the pattern space s/=\n//; # replace the =\n ta # jump to label "a" when something is replaced (that's always the case # except if the last line ends with =) }更改为\n。

使用sed删除空文件中的字符串结果

1 个答案: