如何确定csv文件字段是否以制表符分隔或以逗号分隔。我需要php验证。任何人都可以帮忙。提前谢谢。
答案 0 :(得分:24)
现在回答这个问题为时已晚,但希望它能帮到某个人。
这是一个简单的函数,它将返回文件的分隔符。
function getFileDelimiter($file, $checkLines = 2){
$file = new SplFileObject($file);
$delimiters = array(
',',
'\t',
';',
'|',
':'
);
$results = array();
$i = 0;
while($file->valid() && $i <= $checkLines){
$line = $file->fgets();
foreach ($delimiters as $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
if(count($fields) > 1){
if(!empty($results[$delimiter])){
$results[$delimiter]++;
} else {
$results[$delimiter] = 1;
}
}
}
$i++;
}
$results = array_keys($results, max($results));
return $results[0];
}
使用此功能,如下所示:
$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter
P.S我使用了preg_split()而不是explode(),因为explode('\ t',$ value)不会给出正确的结果。
更新:感谢@RichardEB指出代码中的错误。我现在更新了这个。
答案 1 :(得分:11)
这就是我的所作所为。
这不会100%有效,但这是一个不错的起点。至少,它会减少可能的分隔符数量(使用户更容易选择正确的分隔符)。
/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$handle = file( $file ); # Grabs the CSV file, loads into array
$line = array(); # Stores the count of delimiters in each row
$valid_delimiter = array(); # Stores Valid Delimiters
# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
$line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}
# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){
# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
$match = true;
$prev_value = '';
foreach ( $count as $value ){
if ( $prev_value != '' )
$match = ( $prev_value == $value and $match == true ) ? true : false;
$prev_value = $value;
}
} else {
$match = false;
}
if ( $match == true ) $valid_delimiter[] = $delimiter;
}//foreach
# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";
/* !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/
# The Delimiter for the CSV
echo $delimiters[$delimiter];
答案 2 :(得分:8)
没有100%可靠的方法来确定这一点。你能做的是
答案 3 :(得分:4)
我只计算CSV文件中不同分隔符的出现次数,最多的分隔符应该是正确的分隔符:
//The delimiters array to look through
$delimiters = array(
'semicolon' => ";",
'tab' => "\t",
'comma' => ",",
);
//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
$res[$key] = substr_count($csv, $delim);
}
//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);
reset($res);
$first_key = key($res);
return $delimiters[$first_key];
答案 4 :(得分:3)
在我的情况下,用户提供csv文件,然后将其输入SQL数据库。他们可能会将Excel电子表格保存为逗号或制表符分隔文件。将电子表格转换为SQL的程序需要自动识别字段是否为制表符分隔符或逗号
许多Excel csv导出都将字段标题作为第一行。除作为分隔符之外,标题测试不太可能包含逗号。对于我的情况,我计算了第一行的逗号和制表符,并使用更大的数字来确定它是csv还是tab
答案 5 :(得分:1)
我使用@Jay Bhatt的解决方案来找出csv文件的分隔符,但它对我不起作用,所以我对这个过程应用了一些修复和注释,以便更容易理解。
查看我的@Jay Bhatt函数版本:
function decide_csv_delimiter($file, $checkLines = 10) {
// use php's built in file parser class for validating the csv or txt file
$file = new SplFileObject($file);
// array of predefined delimiters. Add any more delimiters if you wish
$delimiters = array(',', '\t', ';', '|', ':');
// store all the occurences of each delimiter in an associative array
$number_of_delimiter_occurences = array();
$results = array();
$i = 0; // using 'i' for counting the number of actual row parsed
while ($file->valid() && $i <= $checkLines) {
$line = $file->fgets();
foreach ($delimiters as $idx => $delimiter){
$regExp = '/['.$delimiter.']/';
$fields = preg_split($regExp, $line);
// construct the array with all the keys as the delimiters
// and the values as the number of delimiter occurences
$number_of_delimiter_occurences[$delimiter] = count($fields);
}
$i++;
}
// get key of the largest value from the array (comapring only the array values)
// in our case, the array keys are the delimiters
$results = array_keys($number_of_delimiter_occurences, max($number_of_delimiter_occurences));
// in case the delimiter happens to be a 'tab' character ('\t'), return it in double quotes
// otherwise when using as delimiter it will give an error,
// because it is not recognised as a special character for 'tab' key,
// it shows up like a simple string composed of '\' and 't' characters, which is not accepted when parsing csv files
return $results[0] == '\t' ? "\t" : $results[0];
}
我个人使用此功能帮助自动解析PHPExcel的文件,并且它的工作速度非常快。
我建议解析至少10行,以使结果更准确。我个人使用100线,它工作速度快,没有延迟或滞后。您解析的行越多,结果就越准确。
注意:这只是@Jay Bhatt解决问题的修改版本。所有学分都归@Jay Bhatt所有。
答案 6 :(得分:0)
除了 c sv文件始终以逗号分隔的简单回答之外 - 它在名称中,我认为你不能提出任何硬性规则。 TSV和CSV文件都有足够松散的指定,您可以提供可接受的文件。
A\tB,C
1,2\t3
(假设\ t == TAB)
您如何判断这是TSV还是CSV?
答案 7 :(得分:0)
当我输出TSV文件时,我使用\ t使用相同的方法创建选项卡,可以创建像\ n那样的换行符,以便说出来我想方法可以如下:
<?php
$mysource = YOUR SOURCE HERE, file_get_contents() OR HOWEVER YOU WISH TO GET THE SOURCE;
if(strpos($mysource, "\t") > 0){
//We have a tab separator
}else{
// it might be CSV
}
?>
我猜这可能不是正确的方式,因为您可以在实际内容中使用制表符和逗号。这只是一个想法。使用正则表达式可能会更好,尽管我对此并不太了解。
答案 8 :(得分:0)
感谢您的所有投入,我使用您的技巧:preg_split,fgetcsv,loop等。
但是我实现了令人惊讶的不是这里的东西,使用fgets而不是读取整个文件,如果文件很重,那就更好了!
以下是代码:
ini_set("auto_detect_line_endings", true);
function guessCsvDelimiter($filePath, $limitLines = 5) {
if (!is_readable($filePath) || !is_file($filePath)) {
return false;
}
$delimiters = array(
'tab' => "\t",
'comma' => ",",
'semicolon' => ";"
);
$fp = fopen($filePath, 'r', false);
$lineResults = array(
'tab' => array(),
'comma' => array(),
'semicolon' => array()
);
$lineIndex = 0;
while (!feof($fp)) {
$line = fgets($fp);
foreach ($delimiters as $key=>$delimiter) {
$lineResults[$key][$lineIndex] = count (fgetcsv($fp, 1024, $delimiter)) - 1;
}
$lineIndex++;
if ($lineIndex > $limitLines) break;
}
fclose($fp);
// Calculating average
foreach ($lineResults as $key=>$entry) {
$lineResults[$key] = array_sum($entry)/count($entry);
}
arsort($lineResults);
reset($lineResults);
return ($lineResults[0] !== $lineResults[1]) ? $delimiters[key($lineResults)] : $delimiters['comma'];
}
答案 9 :(得分:0)
你可以简单地使用fgetcsv(); PHP本机函数就这样:
function getCsvDelimeter($file)
{
if (($handle = fopen($file, "r")) !== FALSE) {
$delimiters = array(',', ';', '|', ':'); //Put all that need check
foreach ($delimiters AS $item) {
//fgetcsv() return array with unique index if not found the delimiter
if (count(fgetcsv($handle, 0, $item, '"')) > 1) {
$delimiter = $item;
break;
}
}
}
return (isset($delimiter) ? $delimiter : null);
}
答案 10 :(得分:-1)
您也可以使用fgetcsv(http://php.net/manual/en/function.fgetcsv.php)向其传递分隔符参数。如果函数返回false,则表示$ delimiter参数不正确
用于检查分隔符是否为';'
的示例if (($data = fgetcsv($your_csv_handler, 1000, ';')) !== false) { $csv_delimiter = ';'; }
答案 11 :(得分:-1)
简单的事情怎么样?
function findDelimiter($filePath, $limitLines = 5){
$file = new SplFileObject($filePath);
$delims = $file->getCsvControl();
return $delims[0];
}
答案 12 :(得分:-2)
这是我的解决方案。 如果你知道你期望的列数,它的工作原理。 最后,分隔符是$ actual_separation_character
$separator_1=",";
$separator_2=";";
$separator_3="\t";
$separator_4=":";
$separator_5="|";
$separator_1_number=0;
$separator_2_number=0;
$separator_3_number=0;
$separator_4_number=0;
$separator_5_number=0;
/* YOU NEED TO CHANGE THIS VARIABLE */
// Expected number of separation character ( 3 colums ==> 2 sepearation caharacter / row )
$expected_separation_character_number=2;
$file = fopen("upload/filename.csv","r");
while(! feof($file)) //read file rows
{
$row= fgets($file);
$row_1_replace=str_replace($separator_1,"",$row);
$row_1_length=strlen($row)-strlen($row_1_replace);
if(($row_1_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_1_number=$separator_1_number+$row_1_length;
}
$row_2_replace=str_replace($separator_2,"",$row);
$row_2_length=strlen($row)-strlen($row_2_replace);
if(($row_2_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_2_number=$separator_2_number+$row_2_length;
}
$row_3_replace=str_replace($separator_3,"",$row);
$row_3_length=strlen($row)-strlen($row_3_replace);
if(($row_3_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_3_number=$separator_3_number+$row_3_length;
}
$row_4_replace=str_replace($separator_4,"",$row);
$row_4_length=strlen($row)-strlen($row_4_replace);
if(($row_4_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_4_number=$separator_4_number+$row_4_length;
}
$row_5_replace=str_replace($separator_5,"",$row);
$row_5_length=strlen($row)-strlen($row_5_replace);
if(($row_5_length==$expected_separation_character_number)or($expected_separation_character_number==0)){
$separator_5_number=$separator_5_number+$row_5_length;
}
} // while(! feof($file)) END
fclose($file);
/* THE FILE ACTUAL SEPARATOR (delimiter) CHARACTER */
/* $actual_separation_character */
if ($separator_1_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_1;}
else if ($separator_2_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_2;}
else if ($separator_3_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_3;}
else if ($separator_4_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_4;}
else if ($separator_5_number==max($separator_1_number,$separator_2_number,$separator_3_number,$separator_4_number,$separator_5_number)){$actual_separation_character=$separator_5;}
else {$actual_separation_character=";";}
/*
if the number of columns more than what you expect, do something ...
*/
if ($expected_separation_character_number>0){
if ($separator_1_number==0 and $separator_2_number==0 and $separator_3_number==0 and $separator_4_number==0 and $separator_5_number==0){/* do something ! more columns than expected ! */}
}
答案 13 :(得分:-3)
如果你有一个非常大的GB文件示例,请前面几行,放入一个临时文件。在vi中打开临时文件
head test.txt > te1
vi te1
答案 14 :(得分:-4)
我回答这个问题的最简单方法是在纯文本编辑器或TextMate中打开它。