我已经看到多个线程,关于自动检测传入CSV的分隔符的最佳解决方案。它们中的大多数是长度在20到30行之间的函数,多个循环预先确定的分隔符列表,读取前5行和匹配计数e.t.c e.t.c
我刚刚实施了这个程序,只做了一些修改。工作出色。
然后我找到了以下代码:
private function DetectDelimiter($fh)
{
$data_1 = null;
$data_2 = null;
$delimiter = self::$delim_list['comma'];
foreach(self::$delim_list as $key=>$value)
{
$data_1 = fgetcsv($fh, 4096, $value);
$delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
$data_2 = $data_1;
}
$this->SetDelimiter($delimiter);
return $delimiter;
}
这对我来说看起来像是在实现相同的结果,其中$ delim_list是一个分隔符数组,如下所示:
static protected $delim_list = array('tab'=>"\t",
'semicolon'=>";",
'pipe'=>"|",
'comma'=>",");
任何人都可以解释为什么我不应该这么简单地做到这一点,为什么到处都是我看起来更复杂的解决方案似乎是公认的答案?
谢谢!
答案 0 :(得分:4)
修正版。
在你的代码中,如果字符串有超过1个分隔符,你会得到错误的结果(例如:val; string,逗号; val2; val3)。此外,如果文件有1行(行数<分隔符计数)。
以下是固定变体:
private function detectDelimiter($fh)
{
$delimiters = ["\t", ";", "|", ","];
$data_1 = null; $data_2 = null;
$delimiter = $delimiters[0];
foreach($delimiters as $d) {
$data_1 = fgetcsv($fh, 4096, $d);
if(sizeof($data_1) > sizeof($data_2)) {
$delimiter = sizeof($data_1) > sizeof($data_2) ? $d : $delimiter;
$data_2 = $data_1;
}
rewind($fh);
}
return $delimiter;
}
答案 1 :(得分:4)
此功能很优雅:)
/**
* @param string $csvFile Path to the CSV file
* @return string Delimiter
*/
public function detectDelimiter($csvFile)
{
$delimiters = [";" => 0, "," => 0, "\t" => 0, "|" => 0];
$handle = fopen($csvFile, "r");
$firstLine = fgets($handle);
fclose($handle);
foreach ($delimiters as $delimiter => &$count) {
$count = count(str_getcsv($firstLine, $delimiter));
}
return array_search(max($delimiters), $delimiters);
}
答案 2 :(得分:3)
通常,您无法检测文本文件的分隔符。如果有其他提示,则需要在检测中实施它们以确保。
建议方法的一个特殊问题是它将计算文件不同行中的元素数量。假设您有一个这样的文件:
a;b;c;d
a b; c d
this|that;here|there
It's not ready, yet.; We have to wait for peter, paul, and mary.; They will know what to do
虽然这似乎是用分号分隔的,但您的方法将返回comma
。
答案 3 :(得分:1)
另一个(通过结合我在互联网上找到的很多答案而构建:
/**
* Detects the delimiter of a CSV file (can be semicolon, comma or pipe) by trying every delimiter, then
* counting how many potential columns could be found with this delimiter and removing the delimiter from array of
* only one columns could be created (without a working limiter you'll always have "one" column: the entire row).
* The delimiter that created the most columns is returned.
*
* @param string $pathToCSVFile path to the CSV file
* @return string|null nullable delimiter
* @throws \Exception
*/
public static function detectDelimiter(string $pathToCSVFile): ?string
{
$delimiters = [
';' => 0,
',' => 0,
"|" => 0,
];
$handle = fopen($pathToCSVFile, 'r');
$firstLine = fgets($handle);
fclose($handle);
foreach ($delimiters as $delimiterCharacter => $delimiterCount) {
$foundColumnsWithThisDelimiter = count(str_getcsv($firstLine, $delimiterCharacter));
if ($foundColumnsWithThisDelimiter > 1) {
$delimiters[$delimiterCharacter] = $foundColumnsWithThisDelimiter;
}else {
unset($delimiters[$delimiterCharacter]);
}
}
if (!empty($delimiters)) {
return array_search(max($delimiters), $delimiters);
} else {
throw new \Exception('The CSV delimiter could not been found. Should be semicolon, comma or pipe!');
}
}
以及相应的单元测试(您必须添加自定义的test.csv文件):
/**
* Test the delimiter detector
*
* @test
*/
public function testDetectDelimiter()
{
$this->assertEquals(',', Helper::detectDelimiter('test1.csv'));
$this->assertEquals(';', Helper::detectDelimiter('test-csv-with-semicolon-delimiter.csv'));
$this->assertEquals('|', Helper::detectDelimiter('test-csv-with-pipe-delimiter.csv'));
$this->expectExceptionMessage('The CSV delimiter could not been found. Should be semicolon, comma or pipe!');
Helper::detectDelimiter('test-csv-with-failing-delimiter.csv');
}
答案 4 :(得分:0)
这些都不能回答我的用例。所以我做了些修改。
/**
* @param string $filePath
* @param int $checkLines
* @return string
*/
public function getCsvDelimiter(string $filePath, int $checkLines = 3): string
{
$delimeters =[',', ';', '\t'];
$default =',';
$fileObject = new \SplFileObject($filePath);
$results = [];
$counter = 0;
while ($fileObject->valid() && $counter <= $checkLines) {
$line = $fileObject->fgets();
foreach ($delimiters as $delimiter) {
$fields = explode($delimiter, $line);
$totalFields = count($fields);
if ($totalFields > 1) {
if (!empty($results[$delimiter])) {
$results[$delimiter] += $totalFields;
} else {
$results[$delimiter] = $totalFields;
}
}
}
$counter++;
}
if (!empty($results)) {
$results = array_keys($results, max($results));
return $results[0];
}
return $default;
}
答案 5 :(得分:0)
好吧,这会解析CSV的一行(通常是第一行),如果可能有多个分隔符或没有匹配项,则抛出Exceptions。 据此,它还看起来您要测试的分隔符不在带引号的字符串中或未转义。
public function getDelimiter(string $content, $throwExceptionOnNonUnique = true, $expectSingleColumn = false): string
{
// Would be cleaner if you pass the delimiters from outside
// as also the order matters in the special case you've got something like "a,b;c"
// and you don't throw the exception - then the first match is preferred
// But for StackOverflow I put them inside
$delimiters = ["\t", ";", "|", ","];
$result = ',';
$maxCount = 0;
foreach ($delimiters as $delimiter) {
// Impress your code reviewer by some badass regex ;)
$pattern = "/(?<!\\\)(?:\\\\\\\)*(?!\B\"[^\\\"]*)\\" . $delimiter . "(?![^\"]*\\\"\B)/";
$amount = preg_match_all($pattern, $content);
if ($maxCount > 0 && $amount > 0 && $throwExceptionOnNonUnique) {
$msg = 'Identifier is not clear: "' . $result . '" and "' . $delimiter . '" are possible';
throw new \Exception($msg);
}
if ($amount > $maxCount) {
$maxCount = $amount;
$result = $delimiter;
}
}
// If nothing matches and you don't expect that just the CSV just
// consists of one single column without a delimeter at the end
if ($maxCount === 0 && !$expectSingleColumn) {
throw new \Exception('Unknown delimiter');
}
return $result;
}
P.S:也是经过单元测试的-但我不想在这里粘贴100多行测试;)
答案 6 :(得分:0)
这是最短版本,可通过将 SplFileObject 类与方法 getCsvControl 一起使用并使用来检测CSV分隔符。数组取消引用。
有一个问题,下面的功能只有在使用 setCsvControl()函数手动设置了分隔符时才起作用,否则,请使用评分最高的答案之一。
// SplFileObject::getCsvControl — Get the delimiter, enclosure and escape character for CSV
function detectDelimiter($csvFile){
if(!file_exists($csvFile) || !is_readable($csvFile)){
return false;
}
$file = new SplFileObject($csvFile);
return $file->getCsvControl()[0];
}