搜索10个连续的个位数

时间:2017-05-23 12:13:23

标签: windows parsing batch-file search digits

我有一位女士正在给我发电话号码。他们以凌乱的方式发送。每次。所以我想从Skype复制她的整个邮件并让批处理文件解析保存的.txt文件,只搜索10个连续数字。

例如,她发给我:

Hello more numbers for settings please,
WYK-0123456789 
CAMP-0123456789 
0123456789
Include 0123456789
This is an urgent number: 0123456789 
TIDO: 0123456789
Send to> 0123456789

它非常混乱,唯一的常数是10位数。所以我想.bat文件有些人如何扫描这个怪物并给我留下如下内容:

例如我想要的:

0123456789 
0123456789 
0123456789
0123456789
0123456789 
0123456789
0123456789

我尝试了下面的this

@echo off
setlocal enableDelayedExpansion
(
  for /f %%A in (
    'findstr "^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]" yourFile.txt'
  ) do (
    set "ln=%%A"
    echo !ln:~0,9!
  )
)>newFile.txt

不幸的是,只有当每行的开头以10位数开头并且在10位数位于行的中间或末尾的情况下,它才有用。

4 个答案:

答案 0 :(得分:2)

不幸的是,以一般方式解决这个问题非常困难。下面的批处理文件正确地从您的示例文件中获取数字,但如果您的真实数据包含具有不同格式的数字,则程序将失败...当然,在这种情况下,只需要包含新格式在程序中!的 %_N_000DNC_MPF ;$PATH=/_N_WKS_DIR/_N_000DNC_WPD ; TRANSFER DNC ; !!! A NU SE STERGE !!! ; PROG:52343001 M30 %_N_DR_LIBER_BROSA_MPF ;$PATH=/_N_WKS_DIR/_N_ACASA_WPD ;PROGRAM LIBER BROSA DREAPTA ;RECHTE SPINDEL LEEREN CHANDATA(2) STOPRE RE_SP2_SOLL_WZG="0" ;"LAMAJ_20_RAD" ;"MULTI_CDR_LUNG" ;"0" RE_WZW G04 F5 M30 %_N_STG_LIBER_BROSA_MPF ;$PATH=/_N_WKS_DIR/_N_ACASA_WPD ;PROGRAM LIBER BROSA STG. ;LINKE SPINDEL LEEREN CHANDATA(1) LI_SP1_SOLL_WZG="0" ; "DECKEL";"BURGHIU_39";"0" LI_SP3_SOLL_WZG="DECKEL" ;"MULTI_CDR" LI_WZW G04 F2 M30

:)

例如,如果一个带有10个字符的“字”,那么该程序将失败,这不是电话。号码,以数字开头......

答案 1 :(得分:2)

鉴于10位数字是文件每一行中的第一个数字部分(我们称之为numbers.txt),在任何其他数字之前,您可以使用以下内容:

@echo off
setlocal EnableExtensions EnableDelayedExpansion

rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"

rem // The first delimiter is TAB, the last one is SPACE:
for /F "usebackq tokens=1 delims=   ^!#$%%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^^_`abcdefghijklmnopqrstuvwxyz{|}~ " %%L in ("!_FILE!") do (
    set "NUM=%%L#"
    if "!NUM:~%_DIG%!"=="#" echo(%%L
)

endlocal
exit /B

这使用for /F及其delims选项字符串,其中包括除数字之外的大多数ASCII字符。您可以扩展delims选项字符串以保留扩展字符(代码大于0x7F的字符);确保 SPACE 是指定的最后一个字符。

此方法可以从以下行中提取10位数字:

garbage text>0123456789_more text0123-end

但是如果一条线看起来像这样就失败了,所以当第一个数字不是10位数时:

garbage text: 0123 tel. 0123456789; end

这是基于上述方法的综合解决方案。 delims for /F选项的字符列表将在此处自动创建。这可能需要几秒钟,但这只在最开始时完成一次,因此对于大文件,您可能无法识别这种开销:

@echo off
setlocal EnableExtensions DisableDelayedExpansion

rem // Define constants here:
set "_FILE=.\numbers.txt"
set /A "_DIG=10"

rem // Define global variables here:
set "$CHARS="

rem // Capture current code page and set Windows default one:
for /F "tokens=2 delims=:" %%P in ('chcp') do set /A "CP=%%P"
> nul chcp 437

rem /* Generate list of escaped characters other than numerals (escaped means every character
rem    is preceded by `^`); there are some characters excluded:
rem    - NUL (this cannot be stored in an environment variable and should not occur anyway),
rem    - CR + LF, (they build up line-breaks, so they cannot occur within a line obviously),
rem    - SPACE, (because this must be placed as the last character of the `delims`option),
rem    - `"`, (because this impairs the quotation within the following code portion),
rem    - `!` + `^` (they may lead to unexpected results when delayed expansion is enabled): */
setlocal EnableDelayedExpansion
for /L %%I in (0x01,1,0xFF) do (
    rem // Exclude codes of aforementioned characters:
    if %%I GEQ 0x30 if %%I LSS 0x3A (set "SKIP=#") else (set "SKIP=")
    if not defined SKIP if %%I NEQ 0x00 if %%I NEQ 0x0A if %%I NEQ 0x0D (
        if %%I NEQ 0x20 if %%I NEQ 0x21 if %%I NEQ 0x22 if %%I NEQ 0x5E (
            rem // Convert code to character and append to list separated by `^`:
            cmd /C exit %%I
            for /F delims^=^ eol^= %%J in ('
                forfiles /P "%~dp0." /M "%~nx0" /C "cmd /C echo 0x220x!=ExitCode:~-2!0x22"
            ') do (
                set "$CHARS=!$CHARS!^^%%~J"
            )
        )
    )
)
endlocal & set "$CHARS=%$CHARS%"

rem /* Apply escaped list of characters as delimiters and apply some of the characters
rem    excluded before, namely SPACE, `"`, `!` and `^`;
rem    read file using `type` in order to convert from Unicode, if applicable: */
for /F tokens^=1*^ eol^=^ ^ delims^=^!^"^^%$CHARS%^  %%K in ('type "%_FILE%"') do (
    set "NUM=%%K#" & set "REST=%%L"
    rem // Test whether extracted numeric string holds the given number of digits:
    setlocal EnableDelayedExpansion
    if "!NUM:~%_DIG%!"=="#" echo(%%K
    endlocal
    rem /* Current line holds more than a single numeric portion, so process them in a
    rem    sub-routine; this is not called if the line contains a single number only: */
    if defined REST call :SUB REST
)

rem // Restore previous code page:
> nul chcp %CP%

endlocal
exit /B


:SUB  ref_string
    setlocal DisableDelayedExpansion
    setlocal EnableDelayedExpansion
    set "STR=!%~1!"
    rem // Parse line string using the same approach as in the main routine:
    :LOOP
    if defined STR (
        for /F tokens^=1*^ eol^=^ ^ delims^=^^^!^"^^^^%$CHARS%^  %%E in ("!STR!") do (
            endlocal
            set "NUM=%%E#" & set "STR=%%F"
            setlocal EnableDelayedExpansion
            rem // Test whether extracted numeric string holds the given number of digits:
            if "!NUM:~%_DIG%!"=="#" echo(%%E
        )
        rem // Loop back if there are still more numeric parts encountered:
        goto :LOOP
    )
    endlocal
    endlocal
    exit /B

这种方法可以检测文件中各处的10位数字,即使一行中有多个数字。

答案 2 :(得分:2)

@ECHO OFF
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "filename1=%sourcedir%\q44134518.txt"
SET "outfile=%destdir%\outfile.txt"
ECHO %time%
(
FOR /f "usebackqdelims=" %%a IN ("%filename1%") DO SET "line=%%a"&CALL :process
)>"%outfile%"
ECHO %time%

GOTO :EOF

:lopchar
SET "line=%line:~1%"
:process
IF "%line:~9,1%"=="" GOTO :eof
SET "candidate=%line:~0,10%"
SET /a count=0
:testlp
SET "char=%candidate:~0,1%"
IF "%char%" gtr "9" GOTO lopchar
IF "%char%" lss "0" GOTO lopchar
SET /a count+=1
IF %count% lss 10 SET "candidate=%candidate:~1%"&GOTO testlp
ECHO %line:~0,10%
GOTO :eof

您需要更改sourcedirdestdir的设置以适合您的具体情况。 我使用了一个名为q44134518.txt的文件,其中包含您的数据以及一些额外的测试信息。

生成定义为%outfile%

的文件

将每行数据读取到%%a line

line开始处理每个:process。查看该行是否为10个或更多字符,如果不是终止子例程。

由于该行为10个或更多字符,请选择前10个到candidate并将count清除为0。

将第一个字符分配给char,并测试&gt;'9'或小于'0'。如果其中一个为真,请删掉line的第一个字符,然后重试(直到我们有数字或line有9个或更少字符)

计算每个连续的数字。如果我们还没有计算10,请从candidate中删除第一个字符并再次检查。

当我们达到10个连续的数字时,echo line的前10个字符,所有这些都是数字和所需的数据。

答案 3 :(得分:1)

只是另一种选择

@echo off
    setlocal enableextensions disabledelayedexpansion

    rem Configure
    set "file=input.txt"

    rem Initializacion
    set "counter=0" & set "number="

    rem Convert file to a character per line and add ending line
    (for /f "delims=" %%a in ('
        ^( cmd /q /u /c type "%file%" ^& echo( ^)^| find /v ""
    ') do (
        rem See if current character is a number
        (for /f "delims=0123456789" %%b in ("%%a") do (
            rem Not a number, see if we have retrieved 10 consecutive numbers 
            set /a "1/((counter+1)%%11)" || (
                rem We probably have 10 numbers, check and output data
                setlocal enabledelayedexpansion
                if !counter!==10 echo !number!
                endlocal
            )
            rem As current character is not a number, initialize
            set "counter=0" & set "number="
        )) || ( 
            rem Number readed, increase counter and concatenate
            set /a "counter+=1"
            setlocal enabledelayedexpansion
            for %%b in ("!number!") do endlocal & set "number=%%~b%%a"
        )
    )) 2>nul 

基本思想是使用unicode输出启动cmd实例,从此实例中键入文件并使用find过滤两个字节输出,将每个输入行扩展为每行输出一个字符

一旦我们将每个字符放在一个单独的行中,并且在for /f命令中处理此输出,我们只需要连接过多的数字,直到找到非数字字符。此时我们检查是否有一组10个数字被加入,并在需要时输出数据。