在两个字符串之间提取数据

时间:2014-03-27 06:22:46

标签: python json

我有一个带有以下json文件的文件,我想在成绩单和解释之间提取数据。

`" 1010320":{

    "transcript": [

        "1012220", 

        "to build. so three is not correct."
    ], 

    "explain": "Describing&Interpreting" 

}, 

" 1019660":{

    "transcript": [

      "1031920", 

        "The moment disturbance comes, if this control strategy is to be implemented properly, the moment disturbance comes, it is picked up immediately, and corrective action done immediately." 

    ], 

    "explain": "Describing&Interpreting" 
}, 

"1041600": {

   "transcript": [`

"1044860",

"this is also not correct because it will take some time."

],

"explain": "Describing&Interpreting"

},

`" 1053100":{

    "transcript": [ 

        "1073800", 
    ], `

`"解释":"描述&解释"     },

"2082920": { 

    "transcript": [ 

        "2089000", 

        "45 minutes i.e., whereas this taken around 15seconds or something. Is that ok?"
 ], 

    "explain": "Describing&Interpreting" 
}, `

我想对字符串和数字进行排序。

输出应为:

"to build. so three is not correct."

"The moment disturbance comes, if this control strategy is to be implemented properly, the moment disturbance comes, it is picked up immediately, and corrective action done immediately." 

"this is also not correct because it will take some time."

"45 minutes i.e., whereas this taken around 15seconds or something. Is that ok?"

有可能吗?

2 个答案:

答案 0 :(得分:0)

sed -n -e '/",[[:blank:]]*$/,/^[[:blank:]]*],/ {
   /^[[:blank:]]*".*"[[:blank:]]*$/ {
      G;p
      }
   }' YourFile

根据您的示例结构,在以",结尾的字符串和以],开头的字符串之间取字符串,仅打印仅在quote之间的行。 我只是添加了几个空格char的可能性([:blank:]实际上用于扩展空格字符,如tab)

答案 1 :(得分:0)

这可能适合你(GNU sed):

sed -n '/^\s*"transcript": \[/,/^\s*\],/{/^\s*"[^"]*"\s*$/p}' file

这使用seds grep-like模式并打印在transcript子句中以双引号开头和结尾的行。