将具有特定格式的纯文本转换为VIM中的JSON

时间:2017-03-07 12:45:20

标签: json regex vim

我所有的大学笔记都是JSON格式的,当我从pdf中得到一组实际问题时,它的格式如下:

1. Download and compile the code. Run the example to get an understanding of how it works. (Note that both
threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this
is an interface issue, not of concern in this course.)
2. Explore the classes SumTask and StringTask as well as the abstract class Task.
3. Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is
called.
4. Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have
to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.)
Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger
than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer
for a discussion.
5. Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times,
but “pop()”s off only the first task in the queue and executes it.
6. Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as
the following to the SumTask class definition:
private static final String taskType = "SumTask";
Investigate what “static” and “final” mean.
7. More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they
implement this interface. Here’s an example interface:

我想要做的是将其复制到vim并执行find和replace以将其转换为:


    "1": {
        "Task": "Download and compile the code. Run the example to get an understanding of how it works. (Note that both threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this is an interface issue, not of concern in this course.)",
        "Solution": ""
    },
    "2": {
        "Task": "Explore the classes SumTask and StringTask as well as the abstract class Task.",
        "Solution": ""
    },
    "3": {
        "Task": "Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is called.",
        "Solution": ""
    },
    "4": {
        "Task": "Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.) Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer for a discussion.",
        "Solution": ""
    },
    "5": {
        "Task": "Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times, but “pop()”s off only the first task in the queue and executes it.",
        "Solution": ""  
    },
    "6": {
        "Task": "Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as the following to the SumTask class definition: private static final String taskType = 'SumTask'; Investigate what “static” and “final” mean.",
        "Solution": ""
    },
    "7": {
        "Task": "More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they implement this interface. Here’s an example interface:",
        "Solution": "" 
    }

在实际中尝试解决这个问题(而不是实际做到这一点)这是我得到的最接近的:

 
%s/\([1-9][1-9]*\)\. \(\_.\{-}\)--end--/"\1": {\r "Task": "\2",\r"Solution": "" \r},/g

这有三个问题

  1. 我必须在每个问题的末尾添加--end--。我希望通过展望以[1-9] [1-9] *开头的一行来知道问题何时结束。不幸的是,当我搜索它时,它也取代了那部分。
  2. 这会保留问题中的所有新行(在JSON中无效)。我希望它删除新行。
  3. 最后一个条目在输入后不应该包含“,”因为那也是无效的JSON(注意我不介意这个,因为它很容易删除最后一个“,”手动)
  4. 请记住我在正则表达式上非常糟糕,我这样做的原因之一是要了解有关正则表达式的更多信息,请解释您发布的任何正则表达式作为解决方案。

3 个答案:

答案 0 :(得分:2)

分两步:

%s/\n/\ /g

解决问题2,然后:

%s/\([1-9][1-9]*\)\. \(\_.\{-}\([1-9][1-9]*\. \|\%$\)\@=\)/"\1": {\r "Task": "\2",\r"Solution": "" \r},\r/g

解决问题1。 你可以用另一个替换圆来解决问题3。此外,我的解决方案在任务条目的末尾插入了不需要的额外空间。尝试自己删除它。

对我添加内容的简短解释:

\|:或;

\%$:文件结束;

\@=:找到但不包含在比赛中。

答案 1 :(得分:1)

如果每个项目都在一行中,我会用宏转换文本,它比:s更短更直接:

I"<esc>f.s": {<enter>"Task": "<esc>A"<enter>"Solution": ""<enter>},<esc>+

将此宏记录在注册表中,例如q,然后您可以像100@q一样重播它以进行转换。

请注意

  • 结果将留下逗号,并结束,只需删除它。
  • 您还可以在宏录制过程中添加缩进,然后您的json将被打印出来#34;或者你可以使用其他工具让它变得性感。

答案 2 :(得分:1)

您可以使用一个大的正则表达式执行此操作,但很快就会变得无法维护。我会把任务分成3个步骤:

  1. 将每个编号的步骤分成它自己的段落。
  2. 将每个段落放在自己的行上。
  3. 生成JSON。
  4. 合在一起:

    %s/^[0-9]\+\./\r&/
    %s/\(\S\)\n\(\S\)/\1 \2/
    %s/^\([0-9]\+\)\. *\(.*\)/"\1": {\r    "Task": "\2",\r    "Solution": ""\r},/
    

    此解决方案还在最后一个元素后面留下逗号。这可以通过以下方式删除:

    $s/,//
    

    解释

    • %s/^[0-9]\+\./\r&/这匹配一个以数字后跟一个点开头的行,例如1.,8.,13.,131等,并用换行符(\r)替换匹配(&)。
    • %s/\(\S\)\n\(\S\)/\1 \2/这会删除任何两侧都有非空格字符(\S)的换行符。
    • %s/^\([0-9]\+\)\. *\(.*\) ...捕获\1\2中的数字和文字。
    • ... /"\1": {\r "Task": "\2",\r "Solution": ""\r},/正确格式化文字。

    使用sed,awk和jq的替代方法

    您可以使用sedawk直接执行上述第一步和第二步:

    1. sed 's/^[0-9]\+\./\n&/' infile
    2. awk '$1=$1; { print "\n" }' RS= ORS=' '
    3. 使用jq进行第三步确保输出有效JSON:

      1. jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
      2. 这里作为一个命令行:

        sed 's/^[0-9]\+\./\n&/' infile            |
        awk '$1=$1; { print "\n" }' RS= ORS=' '   |
        jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'