Question

我经常在第一行使用列名grep CSV文件。因此，我希望grep的输出始终包含第一行（获取列名称）以及与grep模式匹配的任何行。这样做的最佳方式是什么？

Answer 1

SED：

sed '1p;/pattern/!d' input.txt

AWK：

awk 'NR==1 || /pattern/' input.txt

grep1：

grep1() { awk -v pattern="${1:?pattern is empty}" 'NR==1 || $0~pattern' "${2:?filename is empty}"; }

Answer 2

grep并没有真正的行号概念，但awk确实如此，所以输出行的示例包含“Incoming” - 以及第一行，无论它是什么：

awk 'NR == 1 || /Incoming/' foo.csv

你可以创建一个脚本（有点过分，但是）。我创建了一个文件，grep + 1，并将其放入其中：

#!/bin/sh
pattern="$1" ; shift
exec awk 'NR == 1 || /'"$pattern"'/' "$@"

现在可以：

./grep+1 Incoming

编辑：删除了“{print;}”，这是awk的默认操作。

Answer 3

您可以使用sed代替grep来执行此操作：

sed -n -e '1p' -e '/pattern/p' < $FILE

这将打印第一行两次，但是，如果它恰好包含该模式。

-n告诉sed默认不打印每一行 -e '1p'打印第一行。
-e '/pattern/p'打印与模式匹配的每一行。

Answer 4

这是一个非常通用的解决方案，例如，如果要在保持第一行到位的同时对文件进行排序。基本上，“按原样传递第一行，然后对其余数据执行任何我想要的操作（% Assume schema % CREATE TABLE (idcol SERIAL PRIMARY KEY, colX INTEGER, colY INTEGER); jdbcconn = conn.Handle stmt = jdbcconn.prepareStatement(['INSERT INTO mytable ('... 'colX, colY) '... 'VALUES (?,?) '... 'RETURNING idcol']); stmt.setObject(1, x); stmt.setObject(2, y); rs = stmt.executeQuery(); success = rs.next(); newentry_id = getInt(1); close(rs); close(stmt); / awk / grep / what）。”< / em>的

在脚本中尝试此操作，也许称之为sort（不要忘记keepfirstline并将其放在chmod +x keepfirstline中）：

PATH

可以按如下方式使用：

#!/bin/bash IFS='' read -r JUST1LIINE printf "%s\n" "$JUST1LIINE" exec "$@"

或者，如果您想使用cat your.data.csv | keepfirstline grep SearchTerm > results.with.header.csv过滤

awk

我经常喜欢对文件进行排序，但将标题保留在第一行

cat your.data.csv | keepfirstline awk '$1 < 3' > results.with.header.csv

cat your.data.csv | keepfirstline sort执行它给出的命令（keepfirstline），但只有在读取并打印第一行后才会执行。

Answer 5

另一种选择：

androidTest

示例：

$ cat data.csv | (read line; echo "$line"; grep SEARCH_TERM)

输出：

$ echo "title\nvalue1\nvalue2\nvalue3" | (read line; echo "$line"; grep value2)

Answer 6

您可以为其中一个列名称包含备用模式匹配。如果列名为 COL ，那么这将起作用：

$ grep -E 'COL|pattern' file.csv

Answer 7

所以，我在一段时间之前发布了一个完全不同的简短回答。

然而，对于那些在获取所有相同选项方面看起来像grep的命令的人（虽然这个脚本要求你使用长选项，如果涉及到optarg），并且可以处理文件名中的奇怪字符，等等。把它拉开来玩得很开心。

基本上它是一个总是发出第一行的grep。如果您认为没有匹配行的文件应该跳过发出第一行（标题）行，那么，这就是读者留下的练习。我保存为grep+1。

#!/bin/bash
# grep+1 [<option>...] [<regex>] [<file>...]
# Emits the first line of each input and ignores it otherwise.
# For grep options that have optargs, only the --forms will work here.

declare -a files options
regex_seen=false
regex=

double_dash_seen=false
for arg in "$@" ; do
    is_file_or_rx=true
    case "$arg" in
        -*) is_file_or_rx=$double_dash_seen ;;
    esac
    if $is_file_or_rx ; then
        if ! $regex_seen ; then
            regex="$arg"
            regex_seen=true
        else
            files[${#files[*]}]="$arg"     # append the value
        fi
    else
        options[${#options[*]}]="$arg"     # append the value       
    fi
done

# We could either open files all at once in the shell and pass the handles into
# one grep call, but that would limit how many we can process to the fd limit.
# So instead, here's the simpler approach with a series of grep calls

if $regex_seen ; then
    if [ ${#files[@]} -gt 0 ] ; then
        for file in "${files[@]}" ; do
            head -n 1 "$file"
            tail -n +2 "$file" | grep --label="$file" "${options[@]}" "$regex" 
        done
    else
        grep "${options[@]}"   # stdin
    fi
else
    grep "${options[@]}"   # probably --help
fi

#--eof

Answer 8

所有答案都是正确的。情形grep包括第一行的命令（而不是文件）的输出的另一种想法可以像这样;-）

df -h | grep -E '(^Filesystem|/mnt)'  # <<< returns usage of devices, with mountpoint '/mnt/...'
ps aux | grep -E '(^USER|grep)'       # <<< returns all grep-process

grep的-E选项启用其正则表达式模式。我们的grep字符串使用|，并且可以解释为“或”，因此我们在df-exmaple中查找行：

以Filesystem开头（第一个子表达式中的'^'开头表示“行以”开头）
和包含/mnt

另一种方法是将输出通过管道传递到tempfile并像其他帖子中所示那样对内容进行grep。如果您不知道第一行的内容，这可能会有所帮助。

head -1 <file> && grep ff <file>

Answer 9

只做

head -1 <filename>

然后执行grep

始终在grep中包含第一行

9 个答案:

SED：

AWK：

grep1：