说我有两个csv文件:
csvfile1
:
name Dept City
sree, CSE, Bengaluru,
vatsasa, ECE, Hyd,
IT, VJA,
capini, Mech, TPTY,
DTP, Civil,kandra
Bengaluru,ECM, TVM,
sre, ECS, MNGL,
vatsas, Kochi,
Nano, TVM,
capmin, Tech,
DTP9, CSS, Kochi,
ESS, TVM,
sree0, RSS, MNGL,
csvfile2
:
name, Dept, City, Address
我想检查csvfile2
中存在的csvfile1
的所有列。
如果存在,请检查csvfile1
中的任一列是否包含空格。如果存在空格,请将空格替换为值NULL
,然后将所有列逐列写入新的csv文件csvfile3
。
如果不存在,请将那些缺失的列以及现有的列写入csvfile3
。此外,这些缺失列的值应在NULL
中显示为csvfile3
,现有列下的空格应替换为NULL
以下是预期的输出:
name Dept City Address
sree, CSE, Bengaluru,NULL
vatsasa, ECE, Hyd, NULL
NULL, IT, VJA, NULL
capini, NULL, Mech, NULL
DTP, Civil,NULL, NULL
Bengaluru,ECM, TVM, NULL
sre, ECS, MNGL, NULL
vatsas, NULL, Kochi, NULL
NULL, Nano, TVM, NULL
capmin, NULL, Tech, NULL
DTP9, CSS, Kochi, NULL
NULL, ESS, TVM, NULL
sree0, RSS, MNGL, NULL
我写了以下代码:
f=open('csvfile2.csv', 'r')
g=csv.reader(f)
first=next(g, None)
print('lenght of first list', len(first))
f1=open('csvfile1.csv','r')
h=csv.reader(f1)
second=next(h,None)
print('lenght of first list', len(second))
f2=open('csvfile3', 'w')
writer=csv.writer(f2)
count=0
if len(second) < len(first):
for i in first:
if not i in second:
for count in range:
writer.writerows('Null')
print('null')
else:
ind=second.index(i)
for j in second:
if not j[ind]:
writer.writerows(j[ind].replace(' ','Null'))
else:
writer.writerows(j[ind])
以上代码的输出:
name, Dept, City, Address
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L
我已经在AWS EC2实例上编写了代码,并计划将其也用于AWS Lambda。
答案 0 :(得分:0)
直接在csv文件上操作可能非常困难。我建议使用可与表格数据结构一起使用的熊猫,因为它非常有效并且可以最大限度地减少代码。
示例代码:
import pandas as pd
# READ BOTH FILES AS TABULAR DATA STRUCTURE
# PROVIDE FILE PATH
csv_file_one = pd.read_csv("csv_file_one.csv")
csv_file_two = pd.read_csv("csv_file_two.csv")
# REPLACE EMPTY VALUES WITH NULL IN CSV ONE
csv_file_one.fillna(value='NULL', inplace=True)
header_of_csv_two = list(csv_file_two.columns.values)
# IF CSV FILE ONE DOESN'T HAS COLUMN
# OF CSV TWO, THEN IT WILL CREATE THAT
# WITH NULL VALUES
for each_col in header_of_csv_two:
if each_col not in csv_file_one.columns:
csv_file_one[each_col] = 'NULL'
# WRITING TO CSV
# PROVIDE FILE PATH
csv_file_one.to_csv("csv_file_three.csv", index=False)
示例输出:
name dept city address
ram NULL kathmandu NULL
kiran computer NULL NULL
kumar civil patan NULL
答案 1 :(得分:0)
如果您不想使用pandas
,以下是使用csv
的解决方案:
import csv
with open("csvfile1.csv") as csv_1,\
open("csvfile2.csv") as csv_2,\
open("csvfile3.csv", "w") as csv_3:
reader_1 = csv.reader(csv_1)
reader_2 = csv.reader(csv_2)
writer = csv.writer(csv_3)
headers_1 = next(reader_1)
headers_2 = next(reader_2)
insert_null_at = []
for i, header in enumerate(headers_2):
if header not in headers_1:
insert_null_at.append(i)
writer.writerow(headers_2)
for row in reader_1:
for i in insert_null_at:
row.insert(i, "")
writer.writerow([item if item != "" else "NULL" for item in row])
如上所述,这假设csvfile1
的格式实际上是:
name,Dept,City
sree,CSE,Bengaluru
vatsasa,ECE,Hyd
,IT,VJA
capini,Mech,TPTY
DTP,Civil,kandra
Bengaluru,ECM,TVM
sre,ECS,MNGL
vatsas,,Kochi
,Nano,TVM
capmin,,Tech
DTP9,CSS,Kochi
,ESS,TVM
sree0,RSS,MNGL