按行拆分CSV文件

时间:2019-05-03 15:12:55

标签: python csv

有人,请为我查看此代码。我对代码中的路径感到困惑。顺便说一句,该代码用于基于多个行拆分CSV文件,将其保存在GitHub上,并通过使用它一直试图拆分CSV文件,但是代码对我来说太混乱了。

您也可以点击代码click to see the code的链接

假设要拆分的csv的名称为Dominant.csv, source_filepath为C:\\Users\James\\Desktop\\Work, dest_path是C:\\Users\James\\Desktop\\Work\\Processedresult_filename_prefixsplit

我的困惑是, 代码中的target_filename是我的csv文件Dominant.csv吗? target_filepath到底是什么?

有人可以根据给定的路径和文件名为我重新格式化代码吗?会真的很感激

import csv
import os
import sys


if len(sys.argv) != 5:
    raise Exception('Wrong number of arguments!')


SOURCE_FILEPATH = sys.argv[1]
DEST_PATH = sys.argv[2]
FILENAME_PREFIX = sys.argv[3]
ROW_LIMIT = int(sys.argv[4])


def split_csv(source_filepath, dest_path, result_filename_prefix, row_limit):
    """
    Split a source CSV into multiple CSVs of equal numbers of records,
    except the last file.
    The initial file's header row will be included as a header row in each split
    file.
    Split files follow a zero-index sequential naming convention like so:
        `{result_filename_prefix}_0.csv`
    :param source_filepath {str}:
        File name (including full path) for the file to be split.
    :param dest_path {str}:
        Full path to the directory where the split files should be saved.
    :param result_filename_prefix {str}:
        File name to be used for the generated files.
        Example: If `my_split_file` is provided as the prefix, then a resulting
                 file might be named: `my_split_file_0.csv'
    :param row_limit {int}:
        Number of rows per file (header row is excluded from the row count).
    :return {NoneType}:
    """
    if row_limit <= 0:
        raise Exception('row_limit must be > 0')

    with open(source_filepath, 'r') as source:
        reader = csv.reader(source)
        headers = next(reader)

        file_number = 0
        records_exist = True

        while records_exist:

            i = 0
            target_filename = f'{result_filename_prefix}_{file_number}.csv'
            target_filepath = os.path.join(dest_path, target_filename)

            with open(target_filepath, 'w') as target:
                writer = csv.writer(target)

                while i < row_limit:
                    if i == 0:
                        writer.writerow(headers)

                    try:
                        writer.writerow(next(reader))
                        i += 1
                    except:
                        records_exist = False
                        break

            if i == 0:
                # we only wrote the header, so delete that file
                os.remove(target_filepath)

            file_number += 1


split_csv(SOURCE_FILEPATH, DEST_PATH, FILENAME_PREFIX, ROW_LIMIT)

1 个答案:

答案 0 :(得分:1)

target_filename是您希望输出文件具有的名称。 target_filepath是输出文件的路径,包括其名称。 在split_csv函数调用中: SOURCE_PATH是源文件的路径 DEST_PATH是要在其中输出文件的文件夹的路径 FILENAME_PREFIX是您想要输出文件名开头的文件 ROW_LIMIT是您要写入输出文件的每个文件的最大行数。