我最近偶然发现了这个问题而且我没有设法解决它,因为我不熟悉这种脚本方法。 我需要一个执行以下操作的脚本:
基于列表(list.txt)将该文本中的每一行搜索到多个文件中,如果找到则删除该行(来自其他文件)。 我试图将list.txt保存为数组并使用for进行检查,但我不知道如何搜索字符串并删除该行。 你能帮我解决这个问题吗?
到目前为止,这是我从多个来源提出的:
搜索多个文本文件的REPL.bat:
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
::************ Documentation ***********
:::
:::REPL Search Replace [Options [SourceVar]]
:::REPL /?
:::REPL /V
:::
::: Performs a global search and replace operation on each line of input from
::: stdin and prints the result to stdout.
:::
::: Each parameter may be optionally enclosed by double quotes. The double
::: quotes are not considered part of the argument. The quotes are required
::: if the parameter contains a batch token delimiter like space, tab, comma,
::: semicolon. The quotes should also be used if the argument contains a
::: batch special character like &, |, etc. so that the special character
::: does not need to be escaped with ^.
:::
::: If called with a single argument of /?, then prints help documentation
::: to stdout.
:::
::: If called with a single argument of /V, case insensitive, then prints
::: the version of REPL.BAT. (Currently 3.1)
:::
::: Search - By default, this is a case sensitive JScript (ECMA) regular
::: expression expressed as a string.
:::
::: JScript regex syntax documentation is available at
::: http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
::: Replace - By default, this is the string to be used as a replacement for
::: each found search expression. Full support is provided for
::: substituion patterns available to the JScript replace method.
:::
::: For example, $& represents the portion of the source that matched
::: the entire search pattern, $1 represents the first captured
::: submatch, $2 the second captured submatch, etc. A $ literal
::: can be escaped as $$.
:::
::: An empty replacement string must be represented as "".
:::
::: Replace substitution pattern syntax is fully documented at
::: http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
::: Options - An optional string of characters used to alter the behavior
::: of REPL. The option characters are case insensitive, and may
::: appear in any order.
:::
::: I - Makes the search case-insensitive.
:::
::: L - The Search is treated as a string literal instead of a
::: regular expression. Also, all $ found in Replace are
::: treated as $ literals.
:::
::: B - The Search must match the beginning of a line.
::: Mostly used with literal searches.
:::
::: E - The Search must match the end of a line.
::: Mostly used with literal searches.
:::
::: V - Search and Replace represent the name of environment
::: variables that contain the respective values. An undefined
::: variable is treated as an empty string.
:::
::: A - Only print altered lines. Unaltered lines are discarded.
::: This option is incompatible with the M option.
:::
::: M - Multi-line mode. The entire contents of stdin is read and
::: processed in one pass instead of line by line, thus enabling
::: search for \n. This option is incompatible with the A option.
:::
::: X - Enables extended substitution pattern syntax with support
::: for the following escape sequences within the Replace string:
:::
::: \\ - Backslash
::: \b - Backspace
::: \f - Formfeed
::: \n - Newline
::: \q - Quote
::: \r - Carriage Return
::: \t - Horizontal Tab
::: \v - Vertical Tab
::: \xnn - Extended ASCII byte code expressed as 2 hex digits
::: \unnnn - Unicode character expressed as 4 hex digits
:::
::: Also enables the \q escape sequence for the Search string.
::: The other escape sequences are already standard for a regular
::: expression Search string.
:::
::: Also modifies the behavior of \xnn in the Search string to work
::: properly with extended ASCII byte codes.
:::
::: Extended escape sequences are supported even when the L option
::: is used. Both Search and Replace support all of the extended
::: escape sequences if both the X and L opions are combined.
:::
::: S - The source is read from an environment variable instead of
::: from stdin. The name of the source environment variable is
::: specified in the next argument after the option string. Without
::: the M option, ^ anchors the beginning of the string, and $ the
::: end of the string. With the M option, ^ anchors the beginning
::: of a line, and $ the end of a line.
:::
::************ Batch portion ***********
@echo off
if .%2 equ . (
if "%~1" equ "/?" (
<"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a
exit /b 0
) else if /i "%~1" equ "/V" (
echo REPL.BAT version 3.1
exit /b
) else (
call :err "Insufficient arguments"
exit /b 1
)
)
echo(%~3|findstr /i "[^SMILEBVXA]" >nul && (
call :err "Invalid option(s)"
exit /b 1
)
echo(%~3|findstr /i "M"|findstr /i "A" >nul && (
call :err "Incompatible options"
exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b
************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) options+=args.Item(2).toLowerCase();
var multi=(options.indexOf("m")>=0);
var alterations=(options.indexOf("a")>=0);
if (alterations) options=options.replace(/a/g,"");
var srcVar=(options.indexOf("s")>=0);
if (srcVar) options=options.replace(/s/g,"");
if (options.indexOf("v")>=0) {
options=options.replace(/v/g,"");
search=env(search);
replace=env(replace);
}
if (options.indexOf("x")>=0) {
options=options.replace(/x/g,"");
replace=replace.replace(/\\\\/g,"\\B");
replace=replace.replace(/\\q/g,"\"");
replace=replace.replace(/\\x80/g,"\\u20AC");
replace=replace.replace(/\\x82/g,"\\u201A");
replace=replace.replace(/\\x83/g,"\\u0192");
replace=replace.replace(/\\x84/g,"\\u201E");
replace=replace.replace(/\\x85/g,"\\u2026");
replace=replace.replace(/\\x86/g,"\\u2020");
replace=replace.replace(/\\x87/g,"\\u2021");
replace=replace.replace(/\\x88/g,"\\u02C6");
replace=replace.replace(/\\x89/g,"\\u2030");
replace=replace.replace(/\\x8[aA]/g,"\\u0160");
replace=replace.replace(/\\x8[bB]/g,"\\u2039");
replace=replace.replace(/\\x8[cC]/g,"\\u0152");
replace=replace.replace(/\\x8[eE]/g,"\\u017D");
replace=replace.replace(/\\x91/g,"\\u2018");
replace=replace.replace(/\\x92/g,"\\u2019");
replace=replace.replace(/\\x93/g,"\\u201C");
replace=replace.replace(/\\x94/g,"\\u201D");
replace=replace.replace(/\\x95/g,"\\u2022");
replace=replace.replace(/\\x96/g,"\\u2013");
replace=replace.replace(/\\x97/g,"\\u2014");
replace=replace.replace(/\\x98/g,"\\u02DC");
replace=replace.replace(/\\x99/g,"\\u2122");
replace=replace.replace(/\\x9[aA]/g,"\\u0161");
replace=replace.replace(/\\x9[bB]/g,"\\u203A");
replace=replace.replace(/\\x9[cC]/g,"\\u0153");
replace=replace.replace(/\\x9[dD]/g,"\\u009D");
replace=replace.replace(/\\x9[eE]/g,"\\u017E");
replace=replace.replace(/\\x9[fF]/g,"\\u0178");
replace=replace.replace(/\\b/g,"\b");
replace=replace.replace(/\\f/g,"\f");
replace=replace.replace(/\\n/g,"\n");
replace=replace.replace(/\\r/g,"\r");
replace=replace.replace(/\\t/g,"\t");
replace=replace.replace(/\\v/g,"\v");
replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
function($0,$1,$2){
return String.fromCharCode(parseInt("0x"+$0.substring(2)));
}
);
replace=replace.replace(/\\B/g,"\\");
search=search.replace(/\\\\/g,"\\B");
search=search.replace(/\\q/g,"\"");
search=search.replace(/\\x80/g,"\\u20AC");
search=search.replace(/\\x82/g,"\\u201A");
search=search.replace(/\\x83/g,"\\u0192");
search=search.replace(/\\x84/g,"\\u201E");
search=search.replace(/\\x85/g,"\\u2026");
search=search.replace(/\\x86/g,"\\u2020");
search=search.replace(/\\x87/g,"\\u2021");
search=search.replace(/\\x88/g,"\\u02C6");
search=search.replace(/\\x89/g,"\\u2030");
search=search.replace(/\\x8[aA]/g,"\\u0160");
search=search.replace(/\\x8[bB]/g,"\\u2039");
search=search.replace(/\\x8[cC]/g,"\\u0152");
search=search.replace(/\\x8[eE]/g,"\\u017D");
search=search.replace(/\\x91/g,"\\u2018");
search=search.replace(/\\x92/g,"\\u2019");
search=search.replace(/\\x93/g,"\\u201C");
search=search.replace(/\\x94/g,"\\u201D");
search=search.replace(/\\x95/g,"\\u2022");
search=search.replace(/\\x96/g,"\\u2013");
search=search.replace(/\\x97/g,"\\u2014");
search=search.replace(/\\x98/g,"\\u02DC");
search=search.replace(/\\x99/g,"\\u2122");
search=search.replace(/\\x9[aA]/g,"\\u0161");
search=search.replace(/\\x9[bB]/g,"\\u203A");
search=search.replace(/\\x9[cC]/g,"\\u0153");
search=search.replace(/\\x9[dD]/g,"\\u009D");
search=search.replace(/\\x9[eE]/g,"\\u017E");
search=search.replace(/\\x9[fF]/g,"\\u0178");
if (options.indexOf("l")>=0) {
search=search.replace(/\\b/g,"\b");
search=search.replace(/\\f/g,"\f");
search=search.replace(/\\n/g,"\n");
search=search.replace(/\\r/g,"\r");
search=search.replace(/\\t/g,"\t");
search=search.replace(/\\v/g,"\v");
search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
function($0,$1,$2){
return String.fromCharCode(parseInt("0x"+$0.substring(2)));
}
);
search=search.replace(/\\B/g,"\\");
} else search=search.replace(/\\B/g,"\\\\");
}
if (options.indexOf("l")>=0) {
options=options.replace(/l/g,"");
search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("b")>=0) {
options=options.replace(/b/g,"");
search="^"+search
}
if (options.indexOf("e")>=0) {
options=options.replace(/e/g,"");
search=search+"$"
}
var search=new RegExp(search,options);
var str1, str2;
if (srcVar) {
str1=env(args.Item(3));
str2=str1.replace(search,replace);
if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
} else {
while (!WScript.StdIn.AtEndOfStream) {
if (multi) {
WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
} else {
str1=WScript.StdIn.ReadLine();
str2=str1.replace(search,replace);
if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
}
}
}
我创建了这个用于遍历所有文件(带有2个参数的用户def函数)
:myBatchFunc
for %%F in (*.txt) do (
type "%%F"|repl %~1 %~2 >"%%F.new"
move /y "%%F.new" "%%F"
)
这将是我打电话和运行所有内容的主要批次。
@echo off
set "file=C:\Users\ecatser\Desktop\RPS_cells\EXCEPTII.log"
set /A i=0
for /F "usebackq delims=" %%a in ("%file%") do (
set /A i+=1
call set array[%%i%%]=%%a
call set n=%%i%%
)
for /L %%i in (1,1,%n%) do call myBatchFunc %%array[%%i]%% x
PAUSE
我确实意识到这是一个非常简单的任务代码,有人能为我提供批处理/ perl / python的更好答案吗? 提前谢谢。
P.S(加上我现在使用的脚本用'x'替换字符串。所以它不会删除该行。
编辑: 情况如下: 我有一个目录,包含list.log(基本上是一个例外列表),以及一堆其他的.txt文件。
list.log示例:
53737
52505 // this value matches the cell in .txt
13211
21412
21313
23123
.txt文件示例
LOTS_OF_USELESS_TEXT,Cell=cell52505 // the cell with the same value
LOTS_OF_USELESS_TEXT,Cell=cell20774
LOTS_OF_USELESS_TEXT,Cell=cell22312
LOTS_OF_USELESS_TEXT,Cell=cell20233
LOTS_OF_USELESS_TEXT,Cell=cell12322
输出.txt文件:
LOTS_OF_USELESS_TEXT,Cell=cell20774 // 52505 was removed
LOTS_OF_USELESS_TEXT,Cell=cell22312
LOTS_OF_USELESS_TEXT,Cell=cell20233
LOTS_OF_USELESS_TEXT,Cell=cell12322
所以,我希望脚本逐行读取list.log获取每个值/字符串,并在该目录的每个.txt文件中查找它,并发现IF从文件中删除该行并覆盖IF未找到go从list.log到下一个值/行。 基本上.txt文件是单元格列表,list.log是一个例外列表,我想从.txt文件中删除例外。
我希望这次能解释清楚。
答案 0 :(得分:2)
怎么样:
perl -ani.back -e 'print unless /The text to be search/' list_of_files_to_process
这将删除包含The text to be search
的行,并使用扩展名.back
保存原始文件。
修改强>
perl -ani.back -e 'BEGIN{open $fh,"f.log";@l=<$fh>;chomp@l;$r=join("|",@l)}print unless /\b$r\b/' *.txt
答案 1 :(得分:1)
使用python以下应该可行。它使用正则表达式。读取模式列表,并使用“OR”将模式连接到一个大的正则表达式。然后每行读取每行文件,如果模式不匹配,则将行写入新文件,否则不写入。该脚本期望第一个命令行参数是模式文件,所有后续参数都是要处理的文件的名称。
import re
import sys
# patternfile contains a list of patterns, one per row
# this lines are striped (linebreaks removed) and joined using "OR" regex
with open(sys.argv[1]) as patternfile:
pattern=re.compile('|'.join(map(strip,patternfile.readlines())))
# loop over all files given
for f in sys.argv[2:]:
with open(f,'r') as infile:
fout = infile.name + '.new'
# open outfile with new name
with open(infile.name, 'w') as outfile:
# loop over lines
for line in f:
# check if pattern matches
if re.search(pattern,line)==None: #pattern does not match
outfile.write(line)
必要时,必须调整脚本以删除原始文件。