Question

我有多个“，”分隔的csv文件，其中包含已记录的水管压力传感器数据，已按日期较新的日期排序。对于所有原始文件，第一列始终包含格式为YYYYMMDD的日期。我看过类似的讨论线程但找不到我需要的东西。

Python脚本，为目录中的每个csv文件添加一个新列，其中标题为“Pipe”的新列的每一行都有一个文件名，省略文件扩展名字符串。
可以选择将截止日期指定为YYYYMMDD，以便删除原始输入文件中的行。例如，如果某个文件的日期为20140101到20140630，我希望在日期为＆lt; 20140401
可以选择在进行这些修改后覆盖原始文件，也可以将每个文件保存到不同的目录，文件名与原件相同。

输入：PipeRed.csv;标题：日期，压力1，压力2，温度1，温度2等，

输出：PipeRed.csv;标题：管道，日期，压力1，压力2，温度1，温度2等，

我找到了一些代码并对其进行了一些修改，但它并没有删除上面描述的行，而是将文件名列添加到最后而不是第1行。

import csv
import sys
import glob
import re

for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    writer = csv.writer(f)
    writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
    header = reader.next()
    header.append(filename.replace('.csv',""))
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename.replace('.csv',""))
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

Answer 1

此功能应该非常接近你想要的。我在Python 2.7.9和3.4.2中都进行了测试。我发布的初始版本存在一些问题，因为 - 正如我之后提到的那样 - 它未经测试。我不确定你是使用Python 2还是3，但这在任何一个中都能正常工作。

与之前版本相比的另一个变化是，可选关键字日期参数的名称已从cutoff_date更改为start_date，以更好地反映它的含义。 cutoff date通常表示可以执行某项操作的最后日期 - 与您在问题中使用它的方式相反。另请注意，提供的任何日期都应该是一个字符串，即start_date='20140401'，而不是整数。

一个增强功能是，如果指定了一个但尚未存在的输出目录，它现在将创建输出目录。

import csv
import os
import sys

def open_csv(filename, mode='r'):
    """ Open a csv file in proper mode depending on Python verion. """
    return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
            open(filename, mode=mode, newline=''))

def process_file(filename, start_date=None, new_dir=None):
    # Read the entire contents of the file into memory skipping rows before
    # any start_date given (assuming row[0] is a date column).
    with open_csv(filename, 'r') as f:
        reader = csv.reader(f)
        header = next(reader)  # Save first row.
        contents = [row for row in reader if start_date and row[0] >= start_date
                                                or not start_date]

    # Create different output file path if new_dir was specified.
    basename = os.path.basename(filename)  # Remove dir name from filename.
    output_filename = os.path.join(new_dir, basename) if new_dir else filename
    if new_dir and not os.path.isdir(new_dir):  # Create directory if necessary.
        os.makedirs(new_dir)

    # Open the output file and create a CSV writer for it.
    with open_csv(output_filename, 'w') as f:
        writer = csv.writer(f)

        # Add name of new column to header.
        header = ['Pipe'] + header  # Prepend new column name.
        writer.writerow(header)

        # Data for new column is the base filename without extension.
        new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]

        # Process each row of the body by prepending data for new column to it.
        writer.writerows((new_column+row for row in contents))

按日期删除行，并为多个csv添加文件名列

1 个答案: