Python 2替换CSV中的特定列

时间:2018-04-25 14:46:55

标签: python python-2.7

我有一些CSV文件格式为ID,时间戳,customerID,电子邮件等。我想填充电子邮件列为空,其他列保持相同。我正在使用Python 2.7并且仅限于使用Pandas。谁能帮我? 谢谢大家的帮助

我的代码如下,但这不是效率和可靠性,如果一些原始的具有奇怪的性质,它将打破逻辑。

new_columns = [

    '\xef\xbb\xbfID', 'timestamp', 'CustomerID', 'Email', 'CountryCode', 'LifeCycle', 'Package', 'Paystatus', 'NoUsageEver', 'NoUsage', 'VeryLowUsage',
    'LowUsage', 'NormalUsage', 'HighUsage', 'VeryHighUsage', 'LastStartDate', 'NPS 0-8', 'NPS Score (Q2)', 'Gender(Q38)', 'DOB(Q39)',
    'Viaplay users(Q3)', 'Primary Content (Q42)', 'Primary platform(Q4)', 'Detractor (strong) (Q5)', 'Detractor open text(Q22)',
    'Contact Detractor (Q21)', 'Contact Detractor (Q20)', 'Contact Detractor (Q43)', 'Contact Detractor(Q26)', 'Contact Detractor(Q27)',
    'Contact Detractor(Q44)', 'Improvement areas(Q7)', 'Improvement areas (Q40)', 'D2 More value for money(Q45)', 'D2 Sport content(Q8)',
    'D2 Series content(Q9)', 'D2 Film content(Q10)', 'D2 Children content(Q11)', 'D2 Easy to start and use(Q12)',
    'D2 Technical and quality(Q13)',
    'D2 Platforms(Q14)', 'D2 Service and support(Q15)', 'D3 Sport content(Q16)', 'Missing Sport Content (Q41)',
    'D3 Series and films content(Q17)',
    'NPS 9-10', 'Recommendation drivers(Q28)', 'R2 Sport content(Q29)', 'R2 Series content(Q30)', 'R2 Film content(Q31)',
    'R2 Children content(Q32)', 'R2 Easy to start and use(Q33)', 'R2 Technical and quality(Q34)', 'R2 Platforms(Q35)',
    'R2 Service and support(Q36)',
    'Promoter open text(Q37)'

]

        with open(file_path, 'r') as infile:
            print file_path
            reader = csv.reader(infile, delimiter=";")
            first_row = next(reader)
            for row in reader:
                output_row = []
                for column_name in new_columns:
                    ind = first_row.index(column_name)
                    data = row[ind]
                    if ind == first_row.index('Email'):
                        data = ''
                    output_row.append(data)
                writer.writerow(output_row)

之前的文件格式 enter image description here

之后的文件格式 enter image description here

2 个答案:

答案 0 :(得分:2)

因此,您正在重新排序列并清除电子邮件列:

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)
        for row in reader:
            output_row = []
            for column_name in new_columns:
                ind = first_row.index(column_name)
                data = row[ind]
                if ind == first_row.index('Email'):
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)

我建议将搜索first_row.index(column_name)first_row.index('Email')移出每行处理。

    with open(file_path, 'r') as infile:
        print file_path
        reader = csv.reader(infile, delimiter=";")
        first_row = next(reader)

        email = first_row.index('Email')       
        indexes = []
        for column_name in new_columns:
            ind = first_row.index(column_name)
            indexes.append(ind)

        for row in reader:
            output_row = []
            for ind in indexes:
                data = row[ind]
                if ind == email:
                    data = ''
                output_row.append(data)
            writer.writerow(output_row)

email是输入中电子邮件列的索引。 indexes是输入中列的索引列表,按new_columns指定的顺序排列。

未测试。

答案 1 :(得分:2)

您可以使用csv reader / writer的dict版本按名称获取列。像这样:

import csv
with open('./test.csv', 'r') as infile:
   reader = csv.DictReader(infile, delimiter=";")
   with open('./output.csv', 'w') as outfile:
       writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
       writer.writeheader()
       for row in reader:
           row['Email'] = ''
           writer.writerow(row)