如何使用特定模式从TXT或CSV中删除行

时间:2016-08-08 01:42:35

标签: linux bash csv awk sed

我有一个格式如下的txt文件:

目的是删除以单词"小计组1"开头的行。或"小计第2组"或" Grand Total" (这样的字符串总是在行的开头),但是只有当行的剩余部分有空白字段(或填充空格)时,我才需要删除它们。

使用awk或sed(1遍)可以实现,但我目前正在做3个独立的步骤(每个文本一个)。更通用的语法会很棒。谢谢大家。

我的txt文件如下所示:

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_________________________________________________________________________
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1
Subtotal Group 1                                                2123.82
Subtotal Group 1
Subtotal Group 1

========================================================================
GROUP 2
========================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2
Subtotal Group 2                                                 612.79
Subtotal Group 2
_______________________________________________________________________
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total
Grand total                                                      1511.03

我想要实现的目标输出是:

Some Generic Headers at the beginning of the file
=======================================================================
Group 1
=======================================================================
6.00   500 First Line Text                                      1685.52
1.00   502 Second Line Text                                      280.98
       530 Other Line text                                       157.32
_______________________________________________________________________
Subtotal Group 1                                                2123.82

=======================================================================
GROUP 2
=======================================================================

7.00   701 First Line Text                                        53.63
       711 Second Line text                                       97.85
7.00   740 Third Line text                                       157.32
       741 Any Line text                                         157.32
       742 Any Line text                                          18.04
       801 Last Line text                                        128.63
_______________________________________________________________________
Subtotal Group 2                                                 612.79
_______________________________________________________________________
Grand total                                                     1511.03

5 个答案:

答案 0 :(得分:1)

你可以这样做:

grep -v -P "^(Subtotal Group \d+|Grand total)[,\s]*$" inputfile > outputfile

根据评论编辑。 第二编辑:适应新规范

答案 1 :(得分:1)

如果您的good行始终以结尾,并且您的Any Text行没有,则可以使用:

sed -n '/^.*[0-9]$/p' file

-n将禁止打印模式空间,并且您只会输出以[0-9]结尾的行。给定您的示例文件,输出为:

Subtotal                                         2123.82
Total                                             625.80
Any Word                                         9999.99

答案 2 :(得分:1)

这是一项发明的工作:

http://[::1]/development/application/scripts/center_contacts_server_processing.php
http://[::1]/development/application/scripts/center_contacts_server_processing.php 403 (Forbidden)

如果您愿意,可以在$ grep -Ev '^(Subtotal Group [0-9]+|Grand total)[[:blank:]]*$' file Some Generic Headers at the beginning of the file ======================================================================= Group 1 ======================================================================= 6.00 500 First Line Text 1685.52 1.00 502 Second Line Text 280.98 530 Other Line text 157.32 _________________________________________________________________________ Subtotal Group 1 2123.82 ======================================================================== GROUP 2 ======================================================================== 7.00 701 First Line Text 53.63 711 Second Line text 97.85 7.00 740 Third Line text 157.32 741 Any Line text 157.32 742 Any Line text 18.04 801 Last Line text 128.63 _______________________________________________________________________ Subtotal Group 2 612.79 _______________________________________________________________________ Grand total 1511.03 awk中使用相同的正则表达式:

sed

答案 3 :(得分:0)

如果目标是保留总计/小计行,或者是否应删除它们,那么问题就不是很明确。

此外,还不清楚"#*"评论是输入文件的实际部分,或者它们仅仅是描述性的。

幸运的是,这些都是细节。这与perl相当简单:

$ perl -n -e 'print if /^(Subtotal|Grand Total),(,| |#.*)*/' inputfile
Subtotal,,,                     #This is unuseful --> To be removed
Subtotal,,,                     #This is unuseful --> To be removed
Subtotal,,,125.40               #This is a good line
Subtotal,,,                     #This is unuseful --> To be removed
Grand Total,,,                  #This is unuseful --> To be removed
Grand Total,,,125.40            #This is a good line

这假设您要保留总计和小计行,并删除所有其他行。

要反过来做,要删除总计/小计行,并保留其他行,请将if关键字替换为unless

如果评论实际上并不在输入文件中,那么只需稍微调整一下模式:

perl -n -e 'print if /^(Subtotal|Grand Total),(,| )*/' inputfile

这也忽略了任何额外的空格。如果你想要空格很重要,那就变成:

perl -n -e 'print if /^(Subtotal|Grand Total),(,)*/' inputfile

就像我说的那样,即使你的问题不是100%明确,但不清楚的部分只是细节。 perl将轻松处理所有可能性。

如示例所示,perl将在标准输出上打印已编辑的inputfile。要将inputfile替换为已编辑的内容,只需在命令中添加-i选项(在-e选项之前)。

答案 4 :(得分:0)

尝试解决方案......

awk -F, '{for(i=2;i<=NF;i++){if($i~/[0-9.-]+/){print $0;next}}}' falzone
Subtotal,,,125.40               
Grand Total,,,125.40            
Any other text,,,9999.99

或者,看看非csv版本:

grep [0-9.-] falzone2 
Subtotal                                         2123.82
Total                                             625.80
Any Word                                         9999.99