批量文件编辑.csv文件 - 删除列;得到一部分字符串;删除重复

时间:2014-09-08 02:33:44

标签: batch-file csv cmd substring

我正在使用Windows并获得了一些CSV文件,其中只有部分来自第三列的数据对我感兴趣。以下是我的原始数据的几行示例:

Column.1     Column.2     Column.3         Column.4     Column.5     Column.6  
blah         blah         A/B/C/D/x/x/x    blah         blah         blah
blah         blah         A/B/C/D/x/x/x    blah         blah         blah   
blah         blah         E/F/G/H/x/x/x    blah         blah         blah   

我想用它做的是:
1.删​​除其他列但仅保留Column.3
2.将字符串从Column.3提取到第4个正斜杠,然后删除其余的字符串 3.删除重复条目

所以输出会是这样的:

A/B/C/D  
E/F/G/H

希望这是解释我所追求的更好的方式。

干杯, 艾伦

2 个答案:

答案 0 :(得分:1)

更新

尝试在CMD中阅读HELP FOR

通过启用setlocal enableddelayedexpansion,我们可以创建一个类似于结构的数组:

这将迭代" filename.csv"的行。将每一行设置为名为LINE的临时变量。

然后对于每个令牌" 1,2,3,4,5"由分隔符" \"分开delims=\中的{LINE}并将其存储在row中,然后我们可以在第二个结束后将其回复,如图所示。

@echo off
setlocal enableextensions enabledelayedexpansion
SET /A COUNT=0
for /F "tokens=*" %%A in (d.csv) do (
    set LINE="%%A"
    set /A COUNT+=1
    for /F "tokens=1,2,3,4,5,* delims=\" %%a in (!LINE!) do (
        set row[0]=%%a
        set row[1]=%%b
        set row[2]=%%c
        set row[3]=%%d
        set row[4]=%%e
        set row[5]=%%f
)
        echo This is row: !COUNT!
        echo This is column A: !row[0]!
        echo This is column B: !row[1]!
        echo This is column C: !row[2]!
        echo This is column D: !row[3]!
        echo This is column E: !row[4]!
        echo This is column F: !row[5]!
        echo.
)
REM this is substring manipulation
echo !row[5]:~1,2!
echo !row[5]:~0,2!
echo !row[5]:~3,5!
echo !row[5]:~-3!
endlocal

filename.csv:

A1\anotherB\C\and a d\blah0\blah1\blah1
A2\stuff2\C\D\blah2\blah3\blah1
A3\B\the last C\D\blah4\pizza5\blah1
A4\B\C\D\blah6\blah7\blah1

输出:

C:\Users\UserBob\Desktop\RANDOM\32>3.bat
This is row: 1
This is column A: A1
This is column B: anotherB
This is column C: C
This is column D: and a d
This is column E: blah0
This is column F: blah1\blah1

This is row: 2
This is column A: A2
This is column B: stuff2
This is column C: C
This is column D: D
This is column E: blah2
This is column F: blah3\blah1

This is row: 3
This is column A: A3
This is column B: B
This is column C: the last C
This is column D: D
This is column E: blah4
This is column F: pizza5\blah1

This is row: 4
This is column A: A4
This is column B: B
This is column C: C
This is column D: D
This is column E: blah6
This is column F: blah7\blah1

输出继续这是子串输出(echo !row[5]:~1,2!):

la
bl
h7\bl
ah1

因此,为了您的兴趣,您将使用!row[3]:~num,num!

答案 1 :(得分:0)

@ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "tokens=1-4delims=/" %%a IN (q25716731.txt) DO SET "$%%a_%%b_%%c_%%d=%%a/%%b/%%c/%%d"
(
 FOR /F "tokens=2delims==" %%a In ('set $ 2^>Nul') DO ECHO(%%a
)>newfile.txt

GOTO :EOF

我使用了一个名为q25716731.txt的文件,其中包含一些数据供我测试。文件名不重要。 生成newfile.txt。

请注意,您明确声明了“反斜杠”,然后在数据示例中提供正斜杠。生成常规斜线的例程 - 反斜杠的变化应该是显而易见的。


澄清数据和输出要求的修订

@ECHO OFF
SETLOCAL
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
FOR /f "skip=1tokens=3delims= " %%s IN (q25716731.txt) DO (
 FOR /f "tokens=1-4delims=/" %%a IN ("%%s") DO SET "$%%a_%%b_%%c_%%d=%%a/%%b/%%c/%%d"
)
(
 FOR /F "tokens=2delims==" %%a In ('set $ 2^>Nul') DO ECHO(%%a
)>newfile.txt

GOTO :EOF

我使用了一个名为q25716731.txt的文件,其中包含我的测试数据。 生成newfile.txt

“skip = 1”会跳过列标题行。

目前尚不清楚实际数据是真正的CSV还是实际固定列格式。假设blah不包含空格。