使用管道分隔符从.txt文件中获取特定列值,并使用批处理脚本加载到新文本文件中

时间:2017-07-31 08:32:56

标签: batch-file

我有一个包含N个行和列的文本文件,而我需要使用其值获取特定列并使用批处理脚本将其加载到新文本文件中,例如:

input.txt

col1|col2|col3.....col71|col72
ew|ds|343.....csdk|gfdf
xc|gh|657.....sdfs|utyy
qw|zx|345.....ffds|xzcz

output.txt的

col71|col3
csdk|343
sdfs|657
ffds|345

3 个答案:

答案 0 :(得分:0)

的Linux

您可以使用awk -F "|" '{ print $70 "|" $2 }' input.txt > output.txt

通常可能会执行cut -d"|" -f2,70 input.txt > output.txt,唯一的问题是cut(据我所知)不支持重新排序列。

Powershell的

在Windows的powershell(也适用于Linux)上,您可以使用以下代码段:

Get-Content 'input.txt' | ForEach-Object {
  $array = $_.split("|")
  $array[70] + '|' + $array[2]
} | Out-File 'output.txt'

答案 1 :(得分:0)

要通过(a)某些分隔符将文本拆分为标记,请使用for /F loop。但是,这最多只能处理31个令牌,因此您可以简单地声明tokens=71,但您可以嵌套多个循环:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
> "output.txt" (
    rem // Split off the first 31 tokens, pass the rest to the next loop:
    for /F "usebackq delims=| eol=| tokens=3,31*" %%A in ("input.txt") do (
        rem // Split off the next 31 tokens, pass the rest to the next loop:
        for /F "delims=| eol=| tokens=31*" %%D in ("%%C") do (
            rem /* Extract the proper token from the remaining ones (remember
            rem    that 31 + 31 = 62 tokens have been split off before): */
            for /F "delims=| eol=| tokens=9" %%F in ("%%E") do (
                echo(%%F^|%%A
            )
        )
    )
)
endlocal

如果可能有空列,则上述方法失败,因为for /F将连续分隔符视为一个。为了解决这个问题,您可以执行以下操作:

@echo off
setlocal EnableExtensions DisableDelayedExpansion
> "output.txt" (
    rem // Read complete lines:
    for /F usebackq^ delims^=^ eol^= %%L in ("input.txt") do (
        rem // Store current line string in interim variable:
        set "LINE=%%L"
        setlocal EnableDelayedExpansion
        rem /* Split off the first 31 tokens, pass the rest to the next loop;
        rem    to avoid consecutive delimiters `|`, replace every single one by
        rem    :`"|"`, so `||` becomes `"|""|"`; then enclose the entire result
        rem    within `""`, thus achieving individual tokens enclosed within `""`: */
        for /F "delims=| tokens=3,31*" %%A in (^""!LINE:|="^|"!"^") do (
            endlocal
            rem // Split off the next 31 tokens, pass the rest to the next loop:
            for /F "delims=| tokens=31*" %%D in ("%%C") do (
                rem /* Extract the proper token from the remaining ones (remember
                rem    that 31 + 31 = 62 tokens have been split off before): */
                for /F "delims=| tokens=9" %%F in ("%%E") do (
                    rem // Remove the previously added surrounding `""` by `~`:
                    echo(%%~F^|%%~A
                )
            )
            setlocal EnableDelayedExpansion
        )
        endlocal
    )
)
endlocal

如果已经引用的字段值包含|,则此方法仍然会失败。

答案 2 :(得分:0)

以下批处理文件是一个通用程序,它使用一系列嵌套FOR /F命令,允许访问多达177个令牌,但是以一种非常简单的方式:

@echo off
setlocal EnableDelayedExpansion

rem Method to use up to 177 tokens in a FOR /F command in a simple way
rem Antonio Perez Ayala


rem Create an example file with lines with 180 tokens each
(for %%a in (A B C) do (
   set "line="
   for /L %%i in (1,1,180) do set "line=!line! %%a%%i"
   echo !line!
)) > test.txt
set "line="


rem Load the string of tokens characters from FOR-FcharsCP850.txt file
chcp 850 > NUL
if exist FOR-FcharsCP850.txt goto readChars
echo Creating FOR-F characters file, please wait...
set "options=/d compress=off /d reserveperdatablocksize=26"
type nul > t.tmp

> FOR-FcharsCP850.txt (
set /P "=0" < NUL

rem Create 87 characters in 38..124 range for 3 FOR's with "tokens=1-28*"
set "i=0"
for /L %%i in (38,1,124) do (
   set /A i+=1, mod=i%%29
   if !mod! neq 0 (
      call :genchr %%i
      type %%i.chr
      del %%i.chr
   )
)

rem Create 95 characters for 3 FOR's with "tokens=1-31*"
rem This is the tokens sequence used when code page = 850
set "i=0"
for %%i in (173 189 156 207 190 221 245 249 184 166 174 170 240 169 238 248
            241 253 252 239 230 244 250 247 251 167 175 172 171 243 168 183
            181 182 199 142 143 146 128 212 144 210 211 222 214 215 216 209
            165 227 224 226 229 153 158 157 235 233 234 154 237 232 225 133
            160 131 198 132 134 145 135 138 130 136 137 141 161 140 139 208
            164 149 162 147 228 148 246 155 151 163 150 129 236 231 152    ) do (
   set /A i+=1, mod=i%%32
   if !mod! neq 0 (
      call :genchr %%i
      type %%i.chr
      del %%i.chr
   )
))
del t.tmp temp.tmp
set "options="
:readChars
set /P "char=" < FOR-FcharsCP850.txt
set "lastToken=177"


cls
echo Enter tokens definition string in the same way of FOR /F "tokens=x,y,m-n" one
echo/
echo You may define a tokens range in descending order: "tokens=10-6" = 10 9 8 7 6
echo or add an increment different than 1: "tokens=10-35+5" = 10 15 20 25 30 35
echo Combine them: "tokens=10,28-32,170-161-3" = 10 28 29 30 31 32 170 167 164 161
echo/
echo The maximum token number is 177

:nextSet
echo/
set /P "tokens=tokens="
if errorlevel 1 goto :EOF

rem Expand the given tokens string into a series of individual FOR tokens values 
set "tokensValues="
for %%t in (%tokens%) do (
   for /F "tokens=1-3 delims=-+" %%i in ("%%t") do (
      if "%%j" equ "" (
         if %%i leq %lastToken% set "tokensValues=!tokensValues! %%!char:~%%i,1!"
      ) else (
         if "%%k" equ "" (set "k=1") else set "k=%%k"
         if %%i gtr %%j set "k=-!k!"
         for /L %%n in (%%i,!k!,%%j) do if %%n leq %lastToken% set "tokensValues=!tokensValues! %%!char:~%%n,1!"
      )
   )
)

rem First three FOR's use as tokens the ASCII chars in 38..124 (&..|) range: 28*3 = 84 tokens + 3 tokens for next FOR
rem Next three FOR's use as tokens Extended chars: 31*3 = 93 tokens + 2 tokens for next FOR
rem based on the tokens sequence used when code page = 850
rem Total: 177 tokens

for /F "eol= tokens=1-28*" %%^& in (test.txt) do ^
for /F "eol= tokens=1-28*" %%C in ("%%B") do ^
for /F "eol= tokens=1-28*" %%` in ("%%_") do ^
for /F "eol= tokens=1-31*" %%­ in ("%%|") do ^
for /F "eol= tokens=1-31*" %%µ in ("%%·") do ^
for /F "eol= tokens=1-31"  %%  in ("%%…") do (
   call :getTokens result=
   rem Process here the "result" string:
   echo !result!
)
goto nextSet


:getTokens result=
for %%# in (-) do set "%1=%tokensValues%"
exit /B


REM This code creates one single byte. Parameter: int
REM Teamwork of carlos, penpen, aGerman, dbenham
REM Tested under Win2000, XP, Win7, Win8
:genchr
if %~1 neq 26 (
   makecab %options% /d reserveperfoldersize=%~1 t.tmp %~1.chr > nul
   type %~1.chr | ( (for /l %%N in (1,1,38) do pause)>nul & findstr "^" > temp.tmp )
   >nul copy /y temp.tmp /a %~1.chr /b
) else (
   copy /y nul + nul /a 26.chr /a >nul
)
goto :eof

重要:一系列六个嵌套FOR /F命令在可替换参数中使用以下ASCII字符,并在引号之间使用字符:

  

for / F“eol = tokens = 1-28 *”%% ^&amp; in(test.txt)do ^ %% ^ 38

     

for / F“eol = tokens = 1-28 *”%% C in(“%% B”)do ^ %% 67 in(“66”)

     

for / F“eol = tokens = 1-28 *”%%`in(“%% _”)do ^ %% 96 in(“95”)

     

for / F“eol = tokens = 1-31 *”%% in(“%% |”)do ^ %% 173 in(“124”)

     

for / F“eol = tokens = 1-31 *”%%μin(“%%·”)do ^ %% 181 in(“183”)

     

for / F“eol = tokens = 1-31”%% in(“%% ...”)do(%% 160 in(“133”)

但是,某些网络浏览器似乎无法正确复制粘贴某些扩展字符。如果程序无法正常运行,则应检查这些字符是否已正确复制并在必要时进行修复。您可以尝试复制上面的行(粉红色背景)并测试它们是否被正确复制......

输出示例:

Enter tokens definition string in the same way of FOR /F "tokens=x,y,m-n" one

You may define a tokens range in descending order: "tokens=10-6" = 10 9 8 7 6
or add an increment different than 1: "tokens=10-35+5" = 10 15 20 25 30 35
Combine them: "tokens=10,28-32,170-161-3" = 10 28 29 30 31 32 170 167 164 161

The maximum token number is 177

tokens=10-6
 A10 A9 A8 A7 A6
 B10 B9 B8 B7 B6
 C10 C9 C8 C7 C6

tokens=10-35+5
 A10 A15 A20 A25 A30 A35
 B10 B15 B20 B25 B30 B35
 C10 C15 C20 C25 C30 C35

tokens=10,28-32,170-161-3
 A10 A28 A29 A30 A31 A32 A170 A167 A164 A161
 B10 B28 B29 B30 B31 B32 B170 B167 B164 B161
 C10 C28 C29 C30 C31 C32 C170 C167 C164 C161

tokens=71,3
 A71 A3
 B71 B3
 C71 C3

如果您的应用程序需要少于177个令牌,您可以修改此程序并删除不需要的令牌的代码部分;也就是说,使用2个FOR,您最多可以访问56个令牌,其中3个最多可达84个,4个最多可达115个,最多可达到146个。

您可以查看此方法的详细说明here;您也可以从this post的.zip文件下载(以前的版本)此程序,以便以简单的方式解决六个FOR /F命令中扩展字符的问题...