我处理的csv文件在标头之前的行数不同。我需要根据文件自动设置标题。
以下是文件的示例:
Wine Directory List
Wine Title Vintage Country Region Sub region Appellation Color Bottle Size Price URL FORMAT
Chateau Petrus Pomerol 2011 France Bordeaux Pomerol Red 750ML 2799.99 HTTP://holbrookliquors.com/sku218758.html 1x750ML
Pappy Van Winkle's Bourbon 15 Year Family Reserve United States Kentucky 0ML 999.99 1x0ML
Shipping Fee 0ML 999.99 1x0ML
Heineken Holland Beer Netherlands 0ML 999.99 1x0ML
这是我的转换器:
UPDATE:第一个解决方案:getHeaderLine()。只有挫折:当我开始使用getHeaderLine()解析文件时,由于已经读取了getHeaderLine中的行,因此无法从HeaderLine获取数据。 拜托,外面有人帮我。
public function convert($filePath, $feedColumnsMatch)
{
//this array will contain the elements from the file
$articles = [];
$headerRecord = [];
//if we can open the file on mode "read"
if (($handle = fopen($filePath, 'r')) !== FALSE) {
//represents the line we are reading
$rowCounter = 0;
$nb = $feedColumnsMatch->getNumberOfColumns();
$headerLine = $this->getHeaderLine($handle, $nb, $delimiter);
//as long as there are lines
while (($rowData = fgetcsv($handle, 5000, $delimiter)) !== FALSE) {
//todo enlever le vilain 9
if ($nb===count($rowData)) {
//At x line, are written the keys so we record them in $headerRecord
//What I had first if (9 === $rowCounter) {
//What I now have
if(0 === $rowCounter) {
//trim the titles of columns
for ($i = 0; $i < $nb; $i++) {
$rowData[$i] = trim($rowData[$i]);
}
$headerRecord = $rowData;
}
elseif(9<$rowCounter )
{ //for every other lines...
foreach ($rowData as $key => $value) { //in each line, for each value
// we set $value to the cell ($key) having the same horizontal position than $value
// but where vertical position = 0 (headerRecord[]
$articles[$rowCounter][$headerRecord[$key]] = mb_convert_encoding($value, "UTF-8");
}
}
}
$rowCounter++;
}
fclose($handle);
}
return $articles;
}
public function getHeaderLine($handle, $nbColumns, $delimiter){
$rowCounter = 0;
while (($rowData = fgetcsv($handle, 5000, $delimiter)) !== FALSE) {
$rowCounter++;
if ($nbColumns===count($rowData)){
return $rowCounter;
}
}
return -1;
}
如您所见,我必须在if()中写入“ 9”才能正确解析数据并为每个文件进行更改。
答案 0 :(得分:0)
当您有以不同数量的无效行(空白,标题或标签等)开头的csv / tsv文件时,获得此标头的解决方案是使用side函数来首次解析该文件。具有正确单元格数量的第一行(您应该知道csv文件具有多少列)是您的标头,这样您就可以将标头的数据返回到main函数并继续从您的位置进行解析在辅助功能中停止读取。
整个代码:
public function convert($filePath, $feedColumnsMatch)
{
if(!file_exists($filePath) ) {
return "existe pas";
}
if(!is_readable($filePath)) {
return "pas lisible";
}
//this array will contain the elements from the file
$articles = [];
if($feedColumnsMatch->getFeedFormat()==="tsv" | $feedColumnsMatch->getFeedFormat()==="csv"){
if($feedColumnsMatch->getFeedFormat()==="csv"){
$delimiter = $feedColumnsMatch->getDelimiter();
}else{
$delimiter = "\t";
}
//if we can open the file on mode "read"
if (($handle = fopen($filePath, 'r')) !== FALSE) {
//represents the line we are reading
$nb = $feedColumnsMatch->getNumberOfColumns();
$headerRecord = $this->getHeader($handle, $nb, $delimiter); // With this function, I start parsing the file till line === $headerLine
$rowCounter = 0;
//if there is no header
if (null!==$headerRecord || false!==$headerRecord) {
//as long as there are lines
while (($rowData = fgetcsv($handle, 5000, $delimiter)) !== FALSE) {
//if it is a line with valid number of cells
if ($nb === count($rowData)) {
foreach ($rowData as $key => $value) { //in each line, for each value
// we set $value to the cell ($key) having the same horizontal position than $value
// but where vertical position = 0 (headerRecord[]
$articles[$rowCounter][$headerRecord[$key]] = mb_convert_encoding($value, "UTF-8");
}
}
$rowCounter++;
}
}
else{
new \Exception();
}
fclose($handle);
}
}
return $articles;
}
/**
* is used to get the data of the row containing the header of the csv file or null if no header
* @param $handle
* @param $nbColumns
* @param $delimiter
* @return array|false|null
*/
public function getHeader($handle, $nbColumns, $delimiter){
$rowCounter = 0;
while (($rowData = fgetcsv($handle, 5000, $delimiter)) !== FALSE) {
$rowCounter++;
if ($nbColumns===count($rowData)){
//trim the titles of columns
for ($i = 0; $i < $nbColumns; $i++) {
$rowData[$i] = trim($rowData[$i]);
}
return $rowData;
}
}
return null;
}
但是,我不知道这是否是正确的方法。这只是一种工作方式。