Question

我有一个我要编辑的大型csv文件。此处的编辑意味着删除只有一个值的列。到目前为止，我写了这个（因为我是Python的新手，我被卡住了，不确定这是否是解决问题的正确方法）：

import csv
import collections
import numpy as np 


number_of_rows = 2432 
interseting_cols = [] 

col_values = collections.defaultdict(list)
col_values_named = collections.defaultdict(list)
new_row = collections.defaultdict(list)
inputFile = open('input.csv', 'r',newline='');
outputFile= open('output.csv','w')

reader = csv.reader(inputFile)
writer = csv.writer(outputFile)
#skip field names
next(reader)
for row in reader:
    for col, value in enumerate(row):
        col_values[col].append(value)
        #each column is now saved col_values ( without the headers )


for  i in range(len(col_values)):
    if len(set(col_values[i][:(number_of_rows-1)])) != 1:
        interseting_cols.append(i)# saved the index of the columns with valid values 

inputFile.seek(0)

# reading the file again now with headers
for row in reader:
    for col, value in enumerate(row):
        col_values_named[col].append(value)# saving the columns now with header 


# generating a new CSV file, only with interessting columns :
for i in range(number_of_rows):
    print("i value ",i)
    for j in range(len(interseting_cols)): # I'm not sure about this PART !!!!
            new_row.append(col_values_named[interseting_cols[j]])
            writer.writerow(new_row)

知道如何进行最后一次循环吗？或者有更好的方法来解决这个问题吗？

更新说文件看起来像

---------------------------------------------------
            |A|B   |C   |D  |F   |G|H   |I|J  |K   |       
--------------------------------------------------- 
1           |1|NULL|444 |201|0.01|A|NULL|4|9.5|NULL|     
--------------------------------------------------- 
2           |2|NULL|NULL|201|0   |A|NULL|4|9.5|NULL|
--------------------------------------------------- 
3           |4|NULL|444 |201|0   |A|NULL|4|9.5|NULL|
--------------------------------------------------- 
4           |1|NULL|444 |201|0   |A|NULL|4|9.5|NULL|

在这种情况下，结果应该只包含三列 A，C和F

Answer 1

使用pandas库，您可以通过自己的内置函数减少所有额外工作。以下是您在上面发布的要求的一个小实现。如果您是初学者并需要更清晰的解释，请发表评论并准备好提供更多信息。顺便说一下，开始玩熊猫。

import pandas as pd 

df = pd.read_csv('input.csv')

for columns in df:
    if len(df[columns].unique()) == 1:
        df.drop(columns, 1, inplace=True)

df.to_csv('output.csv', index=None)

Answer 2

除非电子表格真的很大，否则只需阅读整篇文章，然后找到你想要的东西！

未经测试的代码：

LANGUAGE_CODE

使用Python编辑CSV

2 个答案: