Question

我需要从Unix中的文本文件中获取记录。分隔符是多个空格。例如：

2U2133   1239  
1290fsdsf   3234

由此，我需要提取

1239  
3234

所有记录的分隔符总是3个空格。

我需要在unix脚本（.scr）中执行此操作，并将输出写入另一个文件或将其用作do-while循环的输入。我尝试了下面的内容：

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt

test_file.txt是输入文件，file1.txt是查找文件。但上面的方法不起作用，并在awk -F

附近给我语法错误

我尝试将输出写入文件。以下是命令行：

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

这是在命令行中将记录写入output.txt。但是相同的命令在unix脚本中不起作用（它是.scr文件）

请让我知道我哪里出错了以及如何解决这个问题。

谢谢，
Visakh

Answer 1

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

Answer 2

这取决于您计算机上cut的版本或实施方式。某些版本支持一个选项，通常是-i，这意味着“忽略空白字段”，或者等效地允许字段之间有多个分隔符。如果支持，请使用：

cut -i -d' ' -f 2 data.file

如果不是（并且它不是普遍的 - 甚至可能不普及，因为GNU和MacOS X都没有选项），那么使用awk会更好，更便携。

您需要将awk的输出传递到您的循环中，但是：

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

唯一的遗留问题是while循环是否在子shell中，因此不会修改主shell脚本变量，只是它自己的变量副本。

使用bash，您可以使用process substitution：

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

这会在当前shell中留下while循环，但会安排命令的输出显示为来自文件。

${Directory path}中的空白通常不合法 - 除非是我错过的另一个Bash功能;你在一个地方也有一个错字（Directoty）。

Answer 3

除了执行相同操作的其他方法之外，程序中的错误是：您无法从（<）重定向另一个程序的输出。转动你的脚本并使用这样的管道：

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

等

此外，使用“readline”作为变量名称可能会或可能不会让您遇到问题。

Answer 4

在这种特殊情况下，您可以使用以下行

sed 's/   /\t/g' <file_name> | cut -f 2

获取第二列。

Answer 5

在bash中你可以从这样的事情开始：

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}

Answer 6

由于“Directo * t * y path”（脚本的最后一行）中的拼写错误，它无法在脚本中运行。

Answer 7

剪切不够灵活。我通常使用Perl：

public function handle($request, Closure $next) {
    $input = $request->all();
    if ($input) {
        array_walk_recursive($input, function (&$item) {
            $item = trim($item);
            $item = ($item == "") ? null : $item;
        });
        $request->merge($input);
    }
    return $next($request);
}

在 -F 之后，您可以放置任何Perl正则表达式，而不是三重空格。您可以以 $ F [n] 的形式访问字段，其中 n 是字段编号（计数从零开始）。这样就不需要 sed 或 tr 。

Answer 8

这本来应该是评论，但是由于我还不能发表评论，所以我在这里添加。这是一个很好的答案：

tr -s ' ' <text.txt | cut -d ' ' -f4

{tr -s '<character>'将<character>的多个重复实例压缩为一个。

Unix - 需要剪切一个有多个空格作为分隔符的文件 - awk或cut？

8 个答案: