Question

我的目标是使用PHP，如果重复值出现在特定列中，则删除CSV文件的整行，在此示例中为 ID列。我自然希望保留显示重复ID的第一行（参见下面的示例）。

我不想创建新的CSV文件，我想打开文件，删除需要删除的内容，并覆盖当前文件。

我还想存储已在变量中删除的行数。

输入（通知重复ID为3）： file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
3,12/3/18,John Doe  ,33
4,12/3/19,Jane Doe ,21

预期输出： file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
4,12/3/19,Jane Doe ,21

然后还能够：echo $removedRows;输出：1 如何做到这一点？

我设法在一个新文件中获取此信息，但我只想覆盖当前文件，我不知道为什么我得到了＃34; ＆＃34;名称栏周围：

ID,Date,Name,Age
1,12/3/13,"John Doe ",23
2,12/3/19,"Jane Doe ",21
3,12/4/19,"Jane Doe ",19
4,12/3/19,"Jane Doe ",21

使用以下代码：

$input_filename = 'file.csv';

// Move the csv-file to 'newfile' directory
copy($input_filename, 'newfile/'.$input_filename);

$output_filename = 'newfile/'.$input_filename;

$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');

$IDs = array();

// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);

// Flip it so it becomes name => ID
$headers = array_flip($headers);

// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
    $ID = $row[$headers['ID']];
    // Do we already have this ID?
    if (isset($IDs[$ID]))
        continue;

    // Mark this ID as being found
    $IDs[$ID] = true;
    // Write it to the output
    fputcsv($output_file, $row);
}

Answer 1

因为您无法从文件中读取并同时写入文件，所以我建议您将数据写入另一个文件，然后将此文件移动到源文件中，例如：

$input_filename = 'file.csv';
$output_filename = 'newfile/' . $input_filename;

// Copy the csv-file to 'newfile' directory
copy($input_filename, $output_filename);
$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');

$IDs = array();

// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);

// Flip it so it becomes name => ID
$headers = array_flip($headers);

// Deleted rows counter
$rows_deleted = 0;
// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
    $ID = $row[$headers['ID']];
    // Do we already have this ID?
    if (isset($IDs[$ID])) {
        // row skipped - therefore it is deleted
        $rows_deleted++;
        continue;
    }

    // Mark this ID as being found
    $IDs[$ID] = true;
    // Write it to the output
    fputcsv($output_file, $row);
}

// Now we should move output file to input one
rename($output_filename, $input_filename);

echo "Deleted: " . $rows_deleted;

至于数据周围的" - 这是fputcsv的结果。这是出于安全原因。想象一下，如果你的数据不是

3,12/4/19,Jane Doe ,19

但

3,12/4/19,Jane, Doe ,19

您想将Jane, Doe视为一个元素。这就是"为什么解析器如何处理您的行更清楚的原因：

3,12/4/19,"Jane, Doe ",19    // here `Jane, Doe` is one element

通常，在"中包装数据不会影响解析生成的csv。但是如果你确定不想要引用 - 你可以传递更多arguments to fputcsv，但我不确定它是否可以使用空值作为enclosure参数。

使用PHP，如果ID列重复，如何删除CSV行？

输入（通知重复ID为3）： file.csv

预期输出： file.csv

1 个答案: