Question

在Unix上，没有向操作系统添加任何内容（即仅使用 grep ， awk ， sed ， cut ，等等），如何将以下输入分成几个文件（例如＆＃34; _temp1.txt＆＃34;，＆＃34; _temp2.txt＆＃34;等等），从每个＆＃34; CODEVIEW＆＃34;线？请注意，该行很可能以多个空格开头。

如果输入来自API而不是现有文件，该怎么办？

. . .
"events" : [ {
"id" : "123456",
"important" : true,
"codeView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "NORMAL_CODE",
      "value" : "str = wrapper.getParameter("
    }, {
      "type" : "NORMAL_CODE",
      "value" : ")"
    } ],
    "text" : "str = wrapper.getParameter(&quot;motif&quot;)"
  } ],
  "nested" : false
},
"probableStartLocationView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "&lt;init&gt;() @ JSONInputData.java:12"
    } ],
    "text" : "&lt;init&gt;() @ JSONInputData.java:92"
  } ],
  "nested" : false
},
"dataView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TAINT_VALUE",
      "value" : "CP"
    } ],
    "text" : "{{#taint}}CP{{/taint}}"
  } ],
  "nested" : false
},
"collapsedEvents" : [ ],
"dupes" : 0
}, {
"id" : "28861,28862",
"important" : false,
"type" : "P2O",
"description" : "String Operations Occurred",
"extraDetails" : null,
          "codeView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TEXT",
      "value" : "Over the following lines of code, blah blah."
    } ],
    "text" : "Over the following lines of code, blah blah."
  } ],
  "nested" : false
},
"probableStartLocationView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "remplaceString() @ O_UtilCaractere.java:234"
    } ],
    "text" : "remplaceString() @ O_UtilCaractere.java:234"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "replaceString() @ O_UtilCaractere.java:333"
    } ],
    "text" : "replaceString() @ O_UtilCaractere.java:333"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "creerIncidentPaie() @ Incidents.java:444"
    } ],
    "text" : "creerIncidentPaie() @ Incidents.java:219"
  }, {
    "fragments" : [ {
      "type" : "STACKTRACE_LINE",
      "value" : "repliquerAbsenceIncident() @ Incidents.java:876"
    } ],
    "text" : "repliquerAbsenceIncident() @ IncidentsPaieMgr.java:882"
  } ],
  "nested" : false
},
"dataView" : {
  "lines" : [ {
    "fragments" : [ {
      "type" : "TEXT",
      "value" : "insert into TGE_INCIDENT...4&amp;apos;, &amp;apos;YYYYMMDD&amp;apos;), &amp;apos;A&amp;apos;, &amp;apos;"
    }, {
      "type" : "TAINT_VALUE",
      "value" : "CP"
    }, {
      "type" : "TEXT",
      "value" : "&amp;apos;, &amp;apos;&amp;apos;, null, &amp;apos;T&amp;apos;, &amp;apos;ADPTVT&amp;apos;, to_date(&amp;apos;2013012214..."
    } ],
    "text" : "insert into TGE_INCIDENT...4&amp;apos;, &amp;apos;YYYYMMDD&amp;apos;), &amp;apos;A&amp;apos;, &amp;apos;{{#taint}}CP{{/taint}}&amp;apos;, &amp;apos;&amp;apos;, null, &amp;apos;T&amp;apos;, &amp;apos;ADPTVT&amp;apos;, to_date(&amp;apos;2017062214..."
  } ],
  "nested" : false
}
. . .

Answer 1

这将在任何awk中强有力地工作：

awk '/"codeView"/{close(out); out="_temp" ++c ".txt"} out!=""{print > out}' file

Answer 2

尝试：

csplit -f _temp -b %d.tmp file '/codeView/' '{*}'

或者，如果数据来自其他一些程序：

my_api | csplit -f _temp -b %d.tmp - '/codeView/' '{*}'

如何运作

-f _temp -b %d.tmp

这两个选项将分割文件的名称设置为您想要的格式。
file

将其替换为输入文件的名称。如果输入来自标准输入，请使用-。
/codeView/

这是您要拆分的正则表达式。
'{*}'

这告诉csplit不要在第一场比赛停止，而是继续分裂。

Answer 3

awk救援！

$ awk '/"codeView"/{c++} {print > ("_temp" (c+0) ".txt")}' file

直到第一个匹配的标题将在第0个临时文件中。如果密钥可能出现在内容中，则可能更改模式匹配文字匹配$1=="\"codeView\""

您可以将数据传输到awk脚本，而不是从文件中读取。

如果打开的文件太多，您可能需要在错误之前关闭它们。

如何通过多字符分隔符将文件拆分为多个文件？

3 个答案:

如何运作