具有多行和单行的文本文件中的行计数

时间:2014-06-18 08:02:29

标签: sql linux

我在oracle中使用UTL_FILE实用程序将数据输入到csv文件中。我在这里使用脚本。

所以我得到了一组文本文件

情况下:1

test1.csv文件中的输出样本是

"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

现在我使用linux commans作为

来计算test1.csv中的记录数
egrep -c "^\"[0-9]" test1.csv

这里我记录计数为

2      (ACCORDING TO LINUX)

但如果我使用select * from test计算记录数量;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

情况下:2

test2.csv文件中的输出样本是

"sno","name","p" "","","" "","","ramesh is in USA" "","",""

现在我使用linux commans作为

来计算test2.csv中的记录数
egrep -c "^\"[0-9]" test2.csv

这里我记录计数为

0      (ACCORDING TO LINUX)

但如果我使用select * from test计算记录数量;

 COUNT(*)
----------                 (ACCORDING TO DATA BASE)
    2

任何身体都可以帮助我如何计算以下情况下的确切行:1和案例:2使用单个命令

提前感谢。

3 个答案:

答案 0 :(得分:1)

两种情况下的列都不同。为了使它通用,我编写了一个打印行的perl脚本。它从头文件生成正则表达式并用它来计算行。我假设第一行总是代表列数。

#!/usr/bin/perl -w

open(FH, $ARGV[0]) or die "Failed to open file";

# Get coloms from HEADER and use it to contruct regex 
my $head = <FH>;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col);  # Colums count

# Read rest of the rows 
my $rows;
while(<FH>) {
$rows .= $_;
}

# Create regex based on number of coloms
# E.g for 3 coloms, regex should be 
# ".*?",".*?",".*?" 
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);

# /s to treat the data as single line 
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg; 
print "Row count:" . scalar(@row_cnt);

只需将其存储为row_count.pl并将其作为./row_count.pl filename

运行

答案 1 :(得分:0)

egrep -c test1.csv没有匹配的搜索字词,因此它会尝试使用test1.csv作为它尝试搜索的正则表达式。我不知道你是如何设法让它为你的第一个例子返回2。

假设您的示例实际上是准确的,那么实际生成文件中记录数的可用egrep命令为egrep '"[[:digit:]]*"' test1.csv

timp@helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2

timp@helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"

timp@helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2

或者,您可以更好地为SELECT语句添加额外的值。类似于SELECT 'recmatch.,.,',sno,name FROM TABLE;而不是SELECT sno,name FROM TABLE;,然后是grep recmatch.,.,,但这是一种黑客行为。

答案 2 :(得分:0)

在您的第二个示例中,您的行不以"开头,后跟数字。这就是计数为0的原因。您可以尝试egrep -c "^\"([0-9]|\")"来捕获空的第一列值。 但事实上,由于标题行,计算所有行并删除1可能更简单。

<德尔> e.g。 count = $(($(wc -l test.csv) - 1))