我有一个带有标题的csv文件,有时在某一行中有额外的字段。这是因为文本字段中有一个逗号未被转义。
有没有办法在转换成数组之前删除一行?
示例csv文件:
CUST_NUMBER,PO_NUMBER,NAME,SERVICE,DATE,BOX_NUMBER,TRACK_NO,ORDER_NO,INV_NO,INV_AMOUNT
757626003,7383281,JACK SMITH,GND,20180306,1,1Z1370750453578430,2018168325,119348,70.70
757626003,7383282,GERALD SMITH, JR.,GND,20180306,1,1Z9R67670395033411,2018168326,119513,63.72
757626003,7383233,SCOTT R SMITH,GND,20180306,1,1Z1370750982624042,2018168329,119349,39.33
正如您所看到的,第3行有一个额外的字段,因为Gilbert, JR.
在文本字段中有一个逗号而没有被转义,这会将JR.
部分放在SERVICE
列中并将GND
列之外的SERVICE
字段敲入没有标题的列中。
当行包含的字段多于标题时,我想删除整行。
删除行后,我会将剩余的csv转换为类似这样的数组。
<?
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
foreach ($csv as $i => $row) {
if(count($keys) == count($row)){
$csv[$i] = array_combine($keys, $row);
}
}
?>
答案 0 :(得分:1)
正如@Scuzzy所建议的那样,设置坏行
<?php
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
foreach ($csv as $i => $row) {
if(count($keys) == count($row)){
$csv[$i] = array_combine($keys, $row);
}
else unset($csv[$i]);
}
?>
答案 1 :(得分:1)
public class IdentifiedCommandHandler<T, R> : IRequestHandler<IdentifiedCommand<T, R>, R>
where T : IRequest<R>
{
private readonly IMediator _mediator;
private readonly IRequestManager _requestManager;
public IdentifiedCommandHandler(IMediator mediator, IRequestManager requestManager)
{
_mediator = mediator;
_requestManager = requestManager;
}
/// <summary>
/// Creates the result value to return if a previous request was found
/// </summary>
/// <returns></returns>
protected virtual R CreateResultForDuplicateRequest()
{
return default(R);
}
/// <summary>
/// This method handles the command. It just ensures that no other request exists with the same ID, and if this is the case
/// just enqueues the original inner command.
/// </summary>
/// <param name="message">IdentifiedCommand which contains both original command & request ID</param>
/// <returns>Return value of inner command or default value if request same ID was found</returns>
public async Task<R> Handle(IdentifiedCommand<T, R> message, CancellationToken cancellationToken)
{
var alreadyExists = await _requestManager.ExistAsync(message.Id);
if (alreadyExists)
{
return CreateResultForDuplicateRequest();
}
else
{
await _requestManager.CreateRequestForCommandAsync<T>(message.Id);
var result = await _mediator.Send(message.Command);
return result;
}
}
}
输出:
<?php
$data=<<<DATA
NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,Gilbert, JR.,GND
7383236,SCOTT,GND
DATA;
$data = array_map('str_getcsv', explode("\n", $data));
$keys = array_shift($data);
$data = array_filter($data, function($v) {
return count($v) == 3;
});
var_export($data);
使用列标题作为键:
array (
0 =>
array (
0 => '7375536',
1 => 'Ron',
2 => 'GND',
),
1 =>
array (
0 => '7369530',
1 => 'RANDY',
2 => 'GND',
),
3 =>
array (
0 => '7383236',
1 => 'SCOTT',
2 => 'GND',
),
)
答案 2 :(得分:1)
使用array_filter
可以删除回调中不需要的项目。此版本使用$keys
数组作为测试(与您使用的相同),使用use
将其传递回调...
$csv = array_map("str_getcsv", file("books.csv",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
$output = array_filter($csv, function($row) use ($keys) {
return count($row) == count($keys);
});
$output = array_values($output);
print_r($output);
因此,每行不具有相同数量的列将被删除。
我刚刚添加了array_values()
调用来重新索引数组。
如果您可以生成带有引号的文件,那么这个问题就不存在......
NUMBER,NAME,SERVICE
7375536,Ron,GND
7369530,RANDY,GND
7383287,"Gilbert, JR.",GND
7383236,SCOTT,GND
您可以使用您选择的引号括住任何文本字段,以确保将来不会出现此问题。
...替代
$csv = array_map("str_getcsv", file("FILE.CSV",FILE_SKIP_EMPTY_LINES));
$keys = array_shift($csv);
$out = array();
foreach ($csv as $row) {
if(count($keys) == count($row)){
$out[] = array_combine($keys, $row);
}
}
上次更新: 就在我等着出去的时候,尝试了以下几点。这会尝试修复数据,因此您可以从文件中获取所有行...
$out = array();
foreach ($csv as $row) {
if(count($keys) != count($row)){
$row = array_merge(array_slice($row, 0, 2),
[implode(",", array_slice($row, 2, count($row)-9))],
array_slice($row, count($row)-7));
}
$out[] = array_combine($keys, $row);
}