Question

通过流拆分我的意思是：

通过第一个功能暂时过滤流内容
流的一部分由第二功能处理
流的其余部分由第三个函数处理
流永远不会存储（动态）

有时候一个例子比长篇解释更好。此命令行使用tee和process substitution来拆分流：

$> cut -f2 file | tee >( grep "AB" | sort | ... ) | grep -v "AB" | tr A B | ...

在此示例中，流分为两部分：包含"AB"的行和其余部分：

cut -f2 file ---->- line contains "AB" ->- sort ->- ...
             \--->- does not contain "AB" ->- tr A B ->- ...

但我不喜欢这种流分割技术，因为流首先被复制（tee）然后被过滤两次（grep和grep -v）。

因此，我想知道流分割之类的内容是否可用perl，python，ruby，c++等其他语言提供...

我在下面提供了一个更复杂的例子。

复杂`bash`流拆分

counter.sh将流分为三个部分（开始，中间和结束）。对于每个部分，流再次被拆分以计算符号<，|和>的出现次数：

#!/bin/bash    
{
  {  tee >( sed -n '1,/^--$/!p' >&3 ) |
            sed -n '1,/^--$/p'        |
     tee >( echo "del at begin:  $(grep -c '<')"    >&4 ) |
     tee >( echo "add at begin:  $(grep -c '>')"    >&4 ) |
          { echo "chg at begin:  $(grep -c '|')"; } >&4
  }  3>&1 1>&2  |
  {  tee >( sed -n '/^--$/,/^--$/!p' >&3 ) |
            sed -n '/^--$/,/^--$/p'        |
     tee >( echo "del at end:    $(grep -c '<')"    >&4 ) |
     tee >( echo "add at end:    $(grep -c '>')"    >&4 ) |
          { echo "chg at end:    $(grep -c '|')"; } >&4
  }  3>&1 1>&2 |
     tee >( echo "del in middle: $(grep -c '<')"    >&4 ) |
     tee >( echo "add in middle: $(grep -c '>')"    >&4 ) |
            echo "chg in middle: $(grep -c '|')"; 
} 4>&1

此脚本用于计算begin / middle / end部分中添加/更改/删除的行数。此脚本的输入是一个流：

$> cat file-A
1
22
3
4
5
6
77
8

$> cat file-B
22
3
4
42
6
77
8
99

$> diff --side-by-side file-A file-B | egrep -1 '<|\||>' | ./counter.sh
del at begin:  1
add at begin:  0
chg at begin:  0
del at end:    0
add at end:    1
chg at end:    0
del in middle: 0
add in middle: 0
chg in middle: 1

如何在不将数据存储到临时缓冲区的情况下，有效地在其他编程语言中实现此类counter.sh？

答案

如Lennart Regebro所述，我过度思考这个问题。当然，所有这些语言都可以按ysth的答案分割输入流。在伪代码中：

while input-stream
{
    case (begin section)
    {
        case (symbol <) aB++ 
        case (symbol |) cB++ 
        case (symbol >) dB++
    }
    case (middle section)
    {
        case (symbol <) aM++ 
        case (symbol |) cM++ 
        case (symbol >) dM++
    } 
    case (ending section)
    {
        case (symbol <) aE++ 
        case (symbol |) cE++ 
        case (symbol >) dE++
    }
}

PrintResult (aB, cB, dB, aM, cM, dM, aE, cE, dE)

结论：使用python / perl / awk / C++比使用tee + {更好地完成流分割{3}}

Answer 1

Tee只是一个使用基本系统调用的C程序，您可以使用任何提供对系统库访问的语言来实现它。

谷歌搜索

用我最喜欢的语言开球

应该找到你需要的所有答案。

Answer 2

您提到的任何语言都非常适合这种情况。

在Perl中，我不会使用diff命令，我只会在原始文件上使用Algorithm::Diff。

流是否可以从bash转换为其他语言？

复杂`bash`流拆分

答案

2 个答案:

流是否可以从bash转换为其他语言？

复杂bash流拆分

答案

2 个答案:

复杂`bash`流拆分