如何重写文件并在上方移动非行。正则表达式

时间:2019-05-02 21:46:13

标签: regex powershell awk sed

我有一个文本文件,其行格式如下:

"\\server\folder\file name dad dada dad","submitted"
"\\server\folder\file name dad dada xxx","submitted"
"\\server\folder\file name dad dada ttt","submitted"
"\\server\folder\file name dad dada rrr","submitted"
"\\server\folder\file name dad
dada ddd","submitted"
"\\server\folder\file name dad dada rrr","submitted"

该行应始终以"\\server\...开头,并以,"submitted"结尾 但是,如您所见,有时行会被分割并开始运行,但是行的一部分会移到新行。

我需要将文件重写为具有正确格式的新文件。 基本上,如果该行不是以"\\server.."开头,则应将其添加到上一行。 我需要帮助-我可以在Windows(powershell)或Linux(awk,sed)上运行该工具。 预先谢谢你

3 个答案:

答案 0 :(得分:2)

使用switch statement的PowerShell解决方案:

& { 
  switch -wildcard -file in.txt { 
    '"\\server*"' { $_; continue } 
    '"\\server*'  { $prev = $_; continue } 
    default       { $prev + $_ }
  }
} | Set-Content out.txt
  • 通配符表达式"\\server*"与一条自包含的行匹配,该行是从以"结尾的行推断出来的-该行即刻输出($_),然后继续处理下一行(continue)。

  • 通配符表达式"\\server*,通过消除过程,然后匹配 incomplete 行,该行的内容保存在变量$prev中,然后继续下一行

  • 然后,仅针对以下行(并完成)-不完整的行处理默认处理程序default,而字符串串联$prev + $_将两行缝合在一起。

请注意,默认情况下,Set-Content使用Windows PowerShell中系统活动ANSI代码页所隐含的字符编码,而在PowerShell Core 中未使用BOM的UTF-8;使用-Encoding参数选择其他编码。

答案 1 :(得分:1)

如果行不以awk结尾,则可以使用以下p命令将当前行保存在,"submitted"中,如果行以p结尾,则打印该行该行确实以它开头:

awk '{if(/,"submitted"$/){print p?p" "$0:$0;p=""}else{p=$0}}' file

答案 2 :(得分:1)

由于您在Windows上,所以我敢打赌中间的换行符只是\n,而换行符是\r\n s,就像从Excel导出CSV所得到的一样单元格包含换行符,例如:

$ cat -v file
"\\server\folder\file name dad dada dad","submitted"^M
"\\server\folder\file name dad dada xxx","submitted"^M
"\\server\folder\file name dad dada ttt","submitted"^M
"\\server\folder\file name dad dada rrr","submitted"^M
"\\server\folder\file name dad
dada ddd","submitted"^M
"\\server\folder\file name dad dada rrr","submitted"^M

在这种情况下,您需要做的只是(使用GNU awk进行多字符RS和RT):

$ awk -v RS='\r\n' '{$1=$1}1' file
"\\server\folder\file name dad dada dad","submitted"
"\\server\folder\file name dad dada xxx","submitted"
"\\server\folder\file name dad dada ttt","submitted"
"\\server\folder\file name dad dada rrr","submitted"
"\\server\folder\file name dad dada ddd","submitted"
"\\server\folder\file name dad dada rrr","submitted"

否则,您可能只需要:

$ awk -v RS='"\r?\n' '{ORS=RT;$1=$1}1' file
"\\server\folder\file name dad dada dad","submitted"
"\\server\folder\file name dad dada xxx","submitted"
"\\server\folder\file name dad dada ttt","submitted"
"\\server\folder\file name dad dada rrr","submitted"
"\\server\folder\file name dad dada ddd","submitted"
"\\server\folder\file name dad dada rrr","submitted"