删除出现字符串的多行并连接

时间:2017-07-24 10:17:10

标签: bash perl

我是Bash / Perl的新手,并尝试删除发生字符串的文本文件中的多行。到目前为止删除一行我有:

perl -ne '/somestring/ or print' /usr/file.txt > /usr/file1.tmp

要替换我使用的第二行:

perl -ne '/anotherstring/ or print' /usr/file.txt > /usr/file2.tmp

如何连接文件和file2.tmp?

或者如何修改命令以删除somestringanotherstring出现的多行?

2 个答案:

答案 0 :(得分:0)

  

如何连接文件和file2.tmp?

可以用

完成
cat file file2.tmp >> file3.tmp

或者,如果file表示file1.tmp

cat file1.tmp file2.tmp >> file3.tmp

但是,这与您在其余问题中描述的内容不同(即删除出现任何两种模式的任何行)。这可以通过链接你的命令来完成:

perl -ne '/somestring/ or print' /usr/file.txt > /usr/file1.tmp
perl -ne '/anotherstring/ or print' /usr/file1.tmp > /usr/file2.tmp

您可以使用管道删除中间文件file1.tmp

perl -ne '/somestring/ or print' /usr/file.txt | perl -ne '/anotherstring/ or print' > /usr/file2.tmp

这也可以通过使用grep来完成(假设您的字符串不使用任何特定于Perl的正则表达式功能):

grep -v somestring /usr/file.txt | grep -v anotherstring > /usr/file2.tmp

最后,您可以将过滤组合成一个命令/正则表达式:

perl -ne '/somestring|anotherstring/ or print' /usr/file.txt > /usr/file2.tmp

或使用grep

grep -v 'somestring\|anotherstring' /usr/file.txt > /usr/file2.tmp

答案 1 :(得分:-1)

我对你的程序感兴趣,并编写了一个高度动态的Perl程序 打印任何用户定义文件的每一行中的单词的匹配或不匹配,然后将与文件匹配或不匹配的请求行右对齐到新的用户定义的outfile。

我们将解析此文件:iris_dataset.csv:

"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
5.1,3.5,1.4,0.2,"setosa"
4.9,3,1.4,0.2,"setosa"
4.8,3,1.4,0.3,"setosa"
5.1,3.8,1.6,0.2,"setosa"
4.6,3.2,1.4,0.2,"setosa"
7,3.2,4.7,1.4,"versicolor"
6.4,3.2,4.5,1.5,"versicolor"
6.9,3.1,4.9,1.5,"versicolor"
6.6,3,4.4,1.4,"versicolor"
5.5,2.4,3.7,1,"versicolor"
6.3,3.3,6,2.5,"virginica"
5.8,2.7,5.1,1.9,"virginica"
7.1,3,5.9,2.1,"virginica"
6.3,2.9,5.6,1.8,"virginica"
5.9,3,5.1,1.8,"virginica"

它是逗号分隔的值文件,其中的列以逗号分隔。 如果您在电子表格中查看此文件,则可以更好地查看每列项目。我们要寻找的是文件的种类,因此可能匹配的项目是" setosa"," versicolor"和" virginica"。

我的程序首先要求您提供要读取的文件。 在这种情况下,它是iris_dataset.csv,尽管它可以是任何文件。然后编写要写入的文件的名称。我称之为new_iris.csv,但你可以称它为任何东西。

然后我们告诉程序我们正在寻找多少项目,所以如果有3个项目我可以输入:setosa,versicolor,virginica任何顺序。如果有两个我只能输入两个项目,如果有一个,那么我只能在这个示例文件中只输入setosa或versicolor或virginica。

然后我们被问到是否要保留与我们的项目匹配的行, 或者如果我们想要删除与我们的文件匹配的文件行。如果我们保持匹配,我们得到的行与打印到屏幕和我们的outfile的那些项目相匹配。如果我们选择删除,我们会得到与打印到屏幕和文件中的项不匹配的行。如果我们既不选择KEEP也不选择REMOVE,那么我们会收到一条错误消息,我们的新空文件将被删除,因为它什么都不包含。

#!/usr/bin/env perl
# Program: perl_matching.pl
use strict; # Means that we have to explicitly declare our variables with "my", "our" or "local" as we want their scope defined. 
use warnings; # We want to know if and if where errors are showing up in our program. 
use feature 'say'; # Like print, but with automatic ending newline.
use feature 'switch'; # Perl given:when switch statement. 
no warnings 'experimental'; # Perl has something against switch. 

########### This block of code right here is basically equivalent to a unit ls command ##############
opendir(DIR, "."); # Opens the current working directory 
my @files = readdir(DIR); # Reads all files in the current working directory into an array @files. 
closedir(DIR); # Now that we have the array of files, we can close our current working directory.
say "Here are the list of files in your current working directory";
foreach(@files){print "$_\t";} # $_ is the default variable for each item in an array.
########### It is not critical to run the program ####################  

say "\nGive me your filename to read from, extensions and all ..."; # It would be a good idea to have your filename in yoru working directory.
chomp(my $file_read = <STDIN>); # This makes the filename dynamic from user input. 
say "Give me your filename to write to, extensions and all ...";
chomp(my $file_write = <STDIN>); # results will be printed to this file, and standard output. # chomp removes newlines from standard input.

# ' < ' to read from, and '>', to write to ... 
# Opening your file to read from: 
open(my $filehandle_read, '<', $file_read) or die "Problem reading file $_ because $!";
# Open your file to write to. 
open(my $filehandle_write, '>', $file_write) or die "Problem reading file $_ because $!";

say "How many matches are you going to give me?";
my $match_num = <STDIN>;
say "Okay give me the matches now, pressing Enter key between each match.";

my $i = 1; # This is our incrementer between matches. 
my $matches; # This is each match presented line by line. 
my @match_list; # This is our array (list) of $matches
while($i <= $match_num)
{
    $matches = <STDIN>; # One match at a time from standard input. 
    push @match_list, $matches; # Pushes all individual $matches into a list @match_list
    $i = $i + 1; # Increase the incrementor by one so this loop don't last forever. 
}
chomp(@match_list);

undef($matches); # I am clearing each match, so that I can redefine this variable. 

$matches = join('|', @match_list); # " | " is part of a regular expression which means "or" for each item in this scalar matches. 
say "This is what your redefined matches variable looks like: $matches"; 

say "Now you get a choice for your matches"; 
say "KEEP or REMOVE?"; # if you type Keep (case insensitive) you print only the matches to the new file. If you type Remove (case insensitive) you print only the lines to the newfile which do not contain the matches.  
chomp(my $choice = <STDIN>);

my @lines_all = <$filehandle_read>; # The filehandle contains everything in the file, so we can pull all lines of the file to read into an array, where each item in the array is each line of the file opened for reading. 
close $filehandle_read; # we can now close the filehandle for the file for reading since we just pulled all the information from it. 
# We grep for the matching " =~ " lines of our file to read. 
my @lines_matching = grep{$_ =~ m/$matches/} @lines_all;
# We grep for the non-matching " !~ " lines of our file to read.
# Note: $_ is a default variable for every item in the array.   
my @lines_not_matching = grep{$_ !~ m/$matches/} @lines_all;


# This is a Perl style switch statement.
# Note: A given::when::when::default switch statement. 
# is basically equivalent to ...
# while::if::elsif::else statement. 

# In this switch statement only one choice is performed,
# which one depends on if you said "Keep" or "Remove" in your choice. 
given($choice)
{
    when($choice =~ m/Keep/i) # "i" is for case-insensitive, so Keep, KEEP, kEeP, etc are valid. 
    {
    say @lines_matching; # Print the matching lines to the screen. 
    print $filehandle_write @lines_matching; # Print the matching lines to the file. 
    close $filehandle_write; # Close the file now that we are done with it. 
    }
    when($choice =~ m/Remove/i) 
    {
    say @lines_not_matching; # Print the lines that match to the screen.
    print $filehandle_write @lines_not_matching; # Print the lines that do not match to the screen. 
    close $filehandle_write; # Close the file now that we are done with it.
    }
    default 
    {
    say "You must have selected a choice other than Keep or Remove. Don't do that!";
    close $filehandle_write; # Close the file now that we are done with it. 
    unlink($file_write) or warn "Could not unlink file $file_write"; # If you selected neither keep nor remove, we delete the new file to write to as it contains nothing.  
    }
}

以下是正在运行的脚本:

我要求删除包含versicolor和setosa的行,因此只有包含virginica的行将被打印到屏幕和我称为new_iris.csv的outfile。我又问了两件事。注意:在我的程序中,您可以以任何不区分大小写的方式键入Keep或Remove。

  >perl perl_matching.pl
   Here are the list of files in your current working directory
.       ..      iris_dataset.csv        perl_matching.pl
Give me your filename to read from, extensions and all ...
iris_dataset.csv
Give me your filename to write to, extensions and all ...
new_iris.csv
How many matches are you going to give me?
2
Okay give me the matches now, pressing Enter key between each match.
setosa
versicolor
This is what your redefined matches variable looks like: setosa|versicolor
Now you get a choice for your matches
KEEP or REMOVE?
Remove
"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
6.3,3.3,6,2.5,"virginica"
5.8,2.7,5.1,1.9,"virginica"
7.1,3,5.9,2.1,"virginica"
6.3,2.9,5.6,1.8,"virginica"
5.9,3,5.1,1.8,"virginica"

因此,只有那些不包含单词setosa和versicolor的行才会打印到我们的文件中:new_iris.csv:

"Sepal.Length","Sepal.Width","Petal.Length","Petal.Width","Species"
6.3,3.3,6,2.5,"virginica"
5.8,2.7,5.1,1.9,"virginica"
7.1,3,5.9,2.1,"virginica"
6.3,2.9,5.6,1.8,"virginica"
5.9,3,5.1,1.8,"virginica"

我非常喜欢在Perl中使用标准输入。 您可以使用我的脚本只打印包含的文件行 setosa。 (你只要求1场比赛。)