我正在使用以下代码从我无法控制的网站中提取CSV文件。很多时候我得到了已发送的未定义索引或标题,但所有数据都在底部。我想编写一个脚本来打开文件并删除所有行,直到它到达应该在csv中的实际标题行。
每次拉动它时,行数都会改变......
当前示例有49107行,在我想要解析的部分之前我不需要..这是前15行代码中的一小部分和大约20行代码之前我真正希望从文件
<pre class="cake-debug"><a href="javascript:void(0);" onclick="document.getElementById('cakeErr1-trace').style.display = (document.getElementById('cakeErr1-trace').style.display == 'none' ? '' : 'none');"><b>Notice</b> (8)</a>: Undefined index: name [<b>APP/controllers/loads_controller.php</b> line <b>327</b>]<div id="cakeErr1-trace" class="cake-stack-trace" style="display: none;"><a href="javascript:void(0);" onclick="document.getElementById('cakeErr1-code').style.display = (document.getElementById('cakeErr1-code').style.display == 'none' ? '' : 'none')">Code</a> | <a href="javascript:void(0);" onclick="document.getElementById('cakeErr1-context').style.display = (document.getElementById('cakeErr1-context').style.display == 'none' ? '' : 'none')">Context</a><div id="cakeErr1-code" class="cake-code-dump" style="display: none;"><pre><code><span style="color: #000000"> $data[$i]['Load']['drop_date'] = date('m/d/Y' strtotime($value['Load']['drop']));</span></code>
<code><span style="color: #000000"> $data[$i]['Load']['pickup_city'] = $value['Pickup']['city'];</span></code>
"<span class=""code-highlight""><code><span style=""color: #000000""> $data[$i]['Load']['pickup_state'] = $value['Pickup']['State']['name'];</span></code></span></pre></div><pre id=""cakeErr1-context"" class=""cake-context"" style=""display: none;"">$order = ""Load.load_number ASC"""
"$fields = array("
" ""*"""
)
"$conditions = array("
" ""Load.active"" => true"
)
"$results = array("
" array("
" ""Load"" => array()"
" ""Pickup"" => array()"
" ""Destination"" => array()"
)
$result = array(
"Load" => array(
"name" => "ICE CREAM OR RELATED",
"load_number" => "8891517",
"trailer_type" => "R",
"phone_number1" => "800-555-8287",
"phone_number2" => "800-555-8287",
"pickup_date" => "03/09/2014",
"drop_date" => "03/09/2014",
"pickup_city" => "Indianapolis",
"pickup_state" => "Indiana",
"pickup_zipcode" => "46201",
"destination_city" => "London",
"destination_state" => "Kentucky",
"destination_zipcode" => "40741"
)
)
$fp=</pre><pre class="stack-trace">header - [internal], line ??
LoadsController::csv() - APP/controllers/loads_controller.php, line 360
Dispatcher::_invoke() - CORE/cake/dispatcher.php, line 204
Dispatcher::dispatch() - CORE/cake/dispatcher.php, line 170
[main] - APP/webroot/index.php, line 83</pre></div>
</pre>name,load_number,trailer_type,phone_number1,phone_number2,pickup_date,drop_date,pickup_city,pickup_state,pickup_zipcode,destination_city,destination_state,destination_zipcode
"FOOD OR KINDRED PROD",8831029,R,800-555-8287,800-555-8287,03/09/2014,03/10/2014,Aurora,Illinois,60504,"West Memphis",Arkansas,72301
"FOOD OR KINDRED PROD",8831031,R,800-555-8287,800-555-8287,03/12/2014,03/13/2014,Aurora,Illinois,60504,Ashley,Indiana,46705
这就是我想要删除那些不应该存在的行的文件...
name,load_number,trailer_type,phone_number1,phone_number2,pickup_date,drop_date,pickup_city,pickup_state,pickup_zipcode,destination_city,destination_state,destination_zipcode
FOOD OR KINDRED PROD,8831029,R,800-555-8287,800-555-8287,3/9/2014,3/10/2014,Aurora,Illinois,60504,West Memphis,Arkansas,72301
FOOD OR KINDRED PROD,8831031,R,800-555-5555,800-555-5555,3/12/2014,3/13/2014,Aurora,Illinois,60504,Ashley,Indiana,46705
目前我正在使用此代码获取我的CSV
set_time_limit (24 * 60 * 60);
// folder to save downloaded files to. must end with slash
$destination_folder = 'downloads/';
$url = 'http://www.somesite.com/loads/csv';
$newfname = $destination_folder . 'loads1.csv';
$file = fopen ($url, "rb");
if ($file) {
$newf = fopen ($newfname, "wb");
if ($newf)
while(!feof($file)) {
fwrite($newf, fread($file, 1024 * 8 ), 1024 * 8 );
}
}
if ($file) {
fclose($file);
}
if ($newf) {
fclose($newf);
}
和本代码解析它
$selectfile1 = "https://www.somesite.com/downloads/loads1.csv";
// check mime type - application/octet-stream
$content = file($selectfile1);
$posted_content = array();
list($rownum, $row) = each($content);
$posted_content[0] = explode(",", $row);
array_push($posted_content[0], "ID");
$count = 0;
// iterate each row (1 post)
while (list($rownum, $row) = each($content))
{
$count++;
$cols = "ShipAfterDate, ShipBeforeDate, EquipmentID, LengthID, VendorCode, LoadCount, Rate, CargoDescription, Notes,Phone1, Phone2, PostDate,";
$vals = "";
// extract fields from row columns
$items = explode(",", $row);
list( $Description, $OrderNumber, $EquipmentCode, $Phone1, $Phone2, $ShipDate, $DeliveryDate, $OriginCity, $OriginState, $OriginZip, $DestinationCity, $DestinationState, $DestinationZip
) = $items;
array_push($posted_content, $items);
答案 0 :(得分:0)
查看'fgetcsv'(PHP manual),如果有解析错误则返回false,否则返回实际的CSV值。它可能不是解决50k行失败的最快解决方案,但我认为它应该可以正常工作