Question

我在oracle中使用UTL_FILE实用程序将数据输入到csv文件中。我在这里使用脚本。

所以我得到了一组文本文件

情况下：1

test1.csv文件中的输出样本是

"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

现在我使用linux commans作为

来计算test1.csv中的记录数

egrep -c "^\"[0-9]" test1.csv

这里我记录计数为

2      (ACCORDING TO LINUX)

但如果我使用select * from test计算记录数量;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

情况下：2

test2.csv文件中的输出样本是

"sno","name","p" "","","" "","","ramesh is in USA" "","",""

现在我使用linux commans作为

来计算test2.csv中的记录数

egrep -c "^\"[0-9]" test2.csv

这里我记录计数为

0      (ACCORDING TO LINUX)

但如果我使用select * from test计算记录数量;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

任何身体都可以帮助我如何计算以下情况下的确切行：1和案例：2使用单个命令

提前感谢。

Answer 1

两种情况下的列都不同。为了使它通用，我编写了一个打印行的perl脚本。它从头文件生成正则表达式并用它来计算行。我假设第一行总是代表列数。

#!/usr/bin/perl -w

open(FH, $ARGV[0]) or die "Failed to open file";

# Get coloms from HEADER and use it to contruct regex 
my $head = <FH>;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col);  # Colums count

# Read rest of the rows 
my $rows;
while(<FH>) {
$rows .= $_;
}

# Create regex based on number of coloms
# E.g for 3 coloms, regex should be 
# ".*?",".*?",".*?" 
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);

# /s to treat the data as single line 
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg; 
print "Row count:" . scalar(@row_cnt);

只需将其存储为row_count.pl并将其作为./row_count.pl filename

运行

Answer 2

egrep -c test1.csv没有匹配的搜索字词，因此它会尝试使用test1.csv作为它尝试搜索的正则表达式。我不知道你是如何设法让它为你的第一个例子返回2。

假设您的示例实际上是准确的，那么实际生成文件中记录数的可用egrep命令为egrep '"[[:digit:]]*"' test1.csv。

timp@helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2

timp@helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2

或者，您可以更好地为SELECT语句添加额外的值。类似于SELECT 'recmatch.,.,',sno,name FROM TABLE;而不是SELECT sno,name FROM TABLE;，然后是grep recmatch.,.,，但这是一种黑客行为。

Answer 3

在您的第二个示例中，您的行不以"开头，后跟数字。这就是计数为0的原因。您可以尝试egrep -c "^\"([0-9]|\")"来捕获空的第一列值。 ~~但事实上，由于标题行，计算所有行并删除1可能更简单。~~

<德尔> e.g。 ~~count = $（（$（wc -l test.csv） - 1））~~

具有多行和单行的文本文件中的行计数

3 个答案: