根据一行中的条件从多个文件中提取行

时间:2019-09-11 13:27:59

标签: perl awk sed grep text-processing

我有许多目录,其中包含文本文件,其格式如下:

class FeatureFE():
    meta_data = MetaData(
        name='COOL_FEATURE',
        sub_type='EXTRA_COOL_FEATURES',
        required_data=[accounts, logs],
        has_graph=True,
        x_axis_label='Time',
        y_axis_label='Foo',
        graph_caption='Description of my feature',
        priority='low',
    )

我得到的任务是遍历每个.py文件,如果是has_graph=True,则提取namerequired_datagraph_caption-最终目标是这样的CSV结构:

name, required_data, graph_caption
'COOL_FEATURE', [accounts, logs],'Description of my feature',

对于awk / sed / grep来说,这绝对是可行的,但是我很难达到目标。到目前为止,我已经做到了:

grep -E -B 4 -A 5 "has_graph=True" feature_17.py | tr -s ' ' | grep '^ name\|^ required_data\|^ graph_caption' | sed 's/.*=//'

返回哪个

'COOL_FEATURE',
[accounts, logs],
'Description of my feature',

一个文件,但在* .py上运行时不显示任何文件。

帮助非常感谢!

3 个答案:

答案 0 :(得分:1)

每当您的数据中有“名称=值”对时,我发现最好首先创建这些映射的数组,然后简单地按其名称访问值。例如,使用GNU awk将第三个参数匹配()和ENDFILE:

library(shiny)
## Module 1 ####################
mod1_ui <- function(id, label, navid) {
  ns <- NS(id)
  tagList(h2("mod1"))
}
mod1_server <- function(input, output, session, navid) {
  observe({ 
    message("mod1_server ", navid)
    # message("mod1_server ", input$navbarid)
  })
}
## Module 2 ####################
mod2_ui <- function(id, label, navid) {
  ns <- NS(id)
  tagList(h2("mod2"))
}
mod2_server <- function(input, output, session, navid) {
  observe({
    message("mod2_server ", navid)
    # message("mod2_server ", input$navbarid)
  })
}

## Shiny App #####################
ui <- navbarPage(collapsible  = T, id = "navbarid",
                 title = "Title",
                 
                 tabPanel("Module 1 Tab", value = 1,
                          mod1_ui("mod1")
                 ),
                 tabPanel("Module 2 Tab", value = 2,
                          mod2_ui("mod2")
                 )
)

server <- function(input, output, session) {
  callModule(mod1_server, "mod1", input$navbarid)
  callModule(mod2_server, "mod2", input$navbarid)
}

shinyApp(ui, server)

我在打印前添加了双引号,以确保即使您的值包含$ cat tst.awk BEGIN { OFS = "," numNames = split("name required_data graph_caption",names) } match($0,/^\s*(\w+)\s*=\s*(.*\S)\s*,\s*$/,a) { name = a[1] value = a[2] name2value[name] = value } ENDFILE { if ( name2value["has_graph"] == "True" ) { if ( !doneHdr++ ) { for (nameNr=1; nameNr<=numNames; nameNr++) { name = names[nameNr] printf "%s%s", name, (nameNr<numNames ? OFS : ORS) } } for (nameNr=1; nameNr<=numNames; nameNr++) { name = names[nameNr] value = name2value[name] gsub(/"/,"\"\"",value) printf "\"%s\"%s", value, (nameNr<numNames ? OFS : ORS) } } delete name2value } $ awk -f tst.awk file name,required_data,graph_caption "'COOL_FEATURE'","[accounts, logs]","'Description of my feature'" (如,一样)和/或双引号,输出也仍然是有效的CSV。

要将以上内容与[accounts, logs]结合使用,我会这样做:

find

但是先删除脚本的这一部分:

find . -name '*.py' -exec awk -f tst.awk {} +

因此,对于从find传递到awk的每批文件,都不会一次打印出标题行,而只是稍后手动添加该标题行,或者在运行脚本之前将其打印出来。还有其他解决方法,但这是最简单的。

答案 1 :(得分:1)

请尝试以下操作(考虑到您的python文件只有1次此类)。用GNU // Creating a new circle Variables (these are declared in the class) private Point3D centrePoint; private UnitVector3D centreAxis; private double radius; Circle3D alignmentCircle; // In another function centrePoint, centreAxis, and radius are set // Creating a new Circle in another function alignmentCircle = new Circle3D(centrePoint, centreAxis, radius); scanEndPointCircleActor.GetMapper().SetInputConnection(VtkElements.CreateCircle(alignmentCircle)); // I'm looking to try and convert the circle to a set of point cloud data around here. renderWindow.Render(); 测试和编写。

awk

答案 2 :(得分:1)

Perl解决方案:

perl -0777 -nE 'for my $key (qw( name required_data graph_caption )) {
                  ($h{$key}) = /\b$key=(.*),/;
                }
                say join ",", @h{qw{ name required_data graph_caption }};
               ' -- *.py
  • -n逐条读取输入记录,为每个记录执行代码
  • -0777读取整个文件,而不是逐行读取
  • %h哈希填充了从正则表达式匹配中捕获的值,\b代表“单词边界”