Question

我有一些CSV文件，我想用grep（或终端的一些其他功能）解析，以便提取一些信息。它们的形式如下：

* Comment 1
* Comment line 2 explaining what the following numbers mean
1000000 ; 3208105 ; 0.18 ; 0.45 ; 0.00015 ; 0.1485 ; 0.03 ; 1 ; 1 ; 5 ; 477003 ; 

* Comment 3
* Comment 4 explaining the meaning of the following lines

* Comment 5
0; 706520; p; 30.4983
1; 20859; p; 57.8
2; 192814; p; 111.842
3; 344542; p; 130.543
4; 54605; p; 131.598
5; 64746; d; 140.898
6; 442082; p; 214.11
7; 546701; p; 249.167
8; 298394; p; 305.034
9; 81188; p; 305.034
.......

在每个文件中，最多可以有一行，其中第三个字段等于d而不是p。所以要么有一行包含d，要么没有。

我有很多像这样的文件，而我想要做的是从每个文件中提取包含字母d的行（如果存在），并在此行后面追加第一个的最后一个参数 - 注释行，在此示例中为47703。

到目前为止，我设法分别提取了我需要的行。

有了这个，我可以从我拥有的每个文件中提取包含d的每一行：

grep -h -E ' d;' *.csv > output.csv

有了这个，我可以从文件中精确地提取数字47703，如下例所示：

grep -v -e "^*" -e " p; " -e " d; " example_file.csv | cut -d \; -f 11

但我不知道如何将这两者放在一起。

我想从开头的例子中得到的最终输出是这样的一行：

5; 64746; d; 140.898; 47703

我希望为当前目录中的每个CSV文件都有这样的行。

有办法做到这一点吗？

Answer 1

这听起来像是sed：

的工作

parse.sed （GNU sed）

/^ +$/d                          # Ignore empty lines
/^[ 0-9;.]+$/h                   # Save first "number-only" line to hold space
/ d; / {                         # Run block on lines containing ' d; '
  G                              # Copy saved line to pattern space
  s/\n.*; ([0-9]+) *; *$/; \1/   # Append the last number on the second line
  p                              # to the first line and print the result
}

parse.sed （便携式sed）

# Ignore empty lines
/^ +$/d                          

# Save first "number-only" line to hold space
/^[ 0-9;.]+$/h                   

# Run block on lines containing ' d; '
/ d; / {                         

  # Copy saved line to pattern space
  G                              

  # Append the last number on the second line
  # to the first line and print the result
  s/\n.*; ([0-9]+) *; *$/; \1/   
  p                              
}

像这样运行：

sed -Enf parse.sed infile.csv

输出：

5; 64746; d; 140.898; 477003

请注意，这假设您只有一行包含文件中的字符组[ 0-9;.]。

要在所有本地csv文件上运行此操作，请执行以下操作：

sed -Enf parse.sed *.csv

Answer 2

我用循环来循环所有的.csv文件，并将greps中返回的值分配给变量，这些变量在每个循环结束时连接起来：

for f in *.csv ; do value=`grep -v -e "^*" -e " p; " -e " d; " -e '^\s*$' "$f" | cut -d \; -f 11` ; line=`grep -h -E ' d;' "$f" ; echo "$line;$value" ; done

编辑:(我还在第一个grep中添加了-e '^\s*$'，在第一个没有注释的行上获取值。之前，它匹配空行）

这只会回传你想要的5; 64746; d; 140.898; 47703这样的行。如果要将其重定向到某个文件（所有找到的行都将在单个输出文件中），您可以将其添加到该长命令中的最后一个回显中，如：

for f in *.csv ; do value=`grep -v -e "^*" -e " p; " -e " d; " -e '^\s*$' "$f" | cut -d \; -f 11` ; line=`grep -h -E ' d;' "$f" ; echo "$line;$value" > output.csv ; done

为了便于阅读，多行代码相同：

for f in *.csv
do 
    value=`grep -v -e "^*" -e " p; " -e " d; " -e '^\s*$' "$f" | cut -d \; -f 11`
    line=`grep -h -E ' d;' "$f"
    echo "$line;$value"
done

如何在grep

2 个答案: