如何根据列号分隔固定宽度的文本文件

时间:2017-06-12 17:12:14

标签: windows powershell batch-file

我有固定宽度的文本文件。我需要根据列号(列100-120)将内容分隔为变量,并需要检查变量的长度。

变量有可能超过20个字符,我需要删除该特定行

例如:

0         1         2         3         4         5         6         7         8         9         0
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
short_name          des_shrt                              px
BOS1111             ALTIC 6.62 2_23                       106.37500000
BOS2222             AMA                                   47.26000000
BOS3333             AMB                                   12.898000
BOS4444             AMEX Express                          10.09780000
BOS5555             BBC                                   111.2233
BOS6666             CNN                                   123.123445
BOS7777             STACK OVERFLOW                        344.9090
BOS8888             STACT 12.0 2/1988                     10.99999999
BOS9999             ABC                                   20

输出:

px  
106.375  
47.26  
12.898  
10.0978  
111.2233  
123.123445  
344.909  
10.99999999       -> it exceeds 10 digit and should throw error  
20  

1 个答案:

答案 0 :(得分:0)

这是一个纯解决方案 - 请参阅代码中的所有解释性说明:

@echo off
setlocal EnableDelayedExpansion

rem // Define constants here:
set "_FULL_LINES_OUT=#"

rem // Initialise variables:
set "HEAD=#"
rem // Read text file line by line:
for /F "usebackq delims=" %%L in ("%~1") do (
    rem // Store current line into environment variable:
    set "LINE=%%L"
    rem // Extract 12 characters at character position 58:
    set "LINE=!LINE:~58,12!"
    rem // Remove trailing spaces, if any:
    for /F %%K in ("!LINE!") do set "LINE=%%K"
    rem // Check whether line is the first one (header):
    if defined HEAD (
        rem // Return header line:
        if defined _FULL_LINES_OUT (echo %%L) else (echo !LINE!)
        set "HEAD="
    ) else (
        rem // Split numbers into integer and fractional parts:
        for /F "tokens=1* delims=." %%I in ("!LINE!") do (
            set "INT=%%I" & set "FRACT=%%J"
        )
        rem // Remove trailing zeros from fractional part:
        set "FLAG=#"
        for /L %%J in (1,1,12) do (
            if defined FLAG (
                if "!FRACT:~-1!"=="0" (
                    set "FRACT=!FRACT:~,-1!"
                ) else (
                    set "FLAG="
                )
            )
        )
        rem // Reassemble truncated decimal number:
        if defined FRACT (
            set "LINE=!INT!.!FRACT!"
        ) else (
            set "LINE=!INT!"
        )
        rem // Check whether string length of number exceeds 10:
        if not "!LINE:~10!"=="" (
            rem // Number is longer than 10 characters:
            >&2 (if defined _FULL_LINES_OUT (echo %%L) else (echo !LINE!))
        ) else (
            rem // Number is not too long, so return original line:
            if defined _FULL_LINES_OUT (echo %%L) else (echo !LINE!)
        )
    )
)

endlocal
exit /B

假设您将脚本命名为check-px-numbers.bat并且您的数据文件名为D:\Data\data.txt,请运行如下脚本:

check-px-numbers.bat "D:\Data\data.txt" 2> nul

要将输出写入另一个文件D:\Data\filtered.txt,请按以下方式调用脚本:

check-px-numbers.bat "D:\Data\data.txt" > "D:\Data\filtered.txt"

这将在使用您的示例数据时返回以下输出文件:

short_name          des_shrt                              px
BOS1111             ALTIC 6.62 2_23                       106.37500000
BOS2222             AMA                                   47.26000000
BOS3333             AMB                                   12.898000
BOS4444             AMEX Express                          10.09780000
BOS5555             BBC                                   111.2233
BOS6666             CNN                                   123.123445
BOS7777             STACK OVERFLOW                        344.9090
BOS9999             ABC                                   20

以下错误消息将出现在控制台窗口中:

BOS8888             STACT 12.0 2/1988                     10.99999999

如果您希望输出数据如下所示,请将行set "_FULL_LINES_OUT=#"更改为set "_FULL_LINES_OUT="(或将其删除):

px
106.375
47.26
12.898
10.0978
111.2233
123.123445
344.909
20

如果您要覆盖原始文件,则需要分两步完成:

check-px-numbers.bat "D:\Data\data.txt" > "D:\Data\filtered.txt"
> nul move /Y "D:\Data\filtered.txt" "D:\Data\data.txt"

要将错误的行写入文件,请使用:

check-px-numbers.bat "D:\Data\data.txt" 2> "D:\Data\errors.txt"

您可以将其组合在一起写下过滤后的错误行:

check-px-numbers.bat "D:\Data\data.txt" > "D:\Data\filtered.txt" 2> "D:\Data\errors.txt"