使用python

时间:2019-02-26 15:29:24

标签: python amazon-web-services aws-lambda

说我有两个csv文件:

csvfile1

name      Dept  City      
sree,     CSE,  Bengaluru,  
vatsasa,  ECE,  Hyd,      
          IT,   VJA,      
capini,   Mech, TPTY,   
DTP,      Civil,kandra
Bengaluru,ECM,  TVM,      
sre,      ECS,  MNGL,   
vatsas,         Kochi,    
          Nano, TVM,      
capmin,         Tech,       
DTP9,     CSS,  Kochi,    
          ESS,  TVM,    
sree0,    RSS,  MNGL,   

csvfile2

name, Dept, City, Address

我想检查csvfile2中存在的csvfile1的所有列。

  1. 如果存在,请检查csvfile1中的任一列是否包含空格。如果存在空格,请将空格替换为值NULL,然后将所有列逐列写入新的csv文件csvfile3

  2. 如果不存在,请将那些缺失的列以及现有的列写入csvfile3。此外,这些缺失列的值应在NULL中显示为csvfile3,现有列下的空格应替换为NULL

以下是预期的输出:

name      Dept  City      Address
sree,     CSE,  Bengaluru,NULL
vatsasa,  ECE,  Hyd,      NULL
NULL,     IT,   VJA,      NULL
capini,   NULL, Mech,     NULL
DTP,      Civil,NULL,     NULL
Bengaluru,ECM,  TVM,      NULL
sre,      ECS,  MNGL,     NULL
vatsas,   NULL, Kochi,    NULL
NULL,     Nano, TVM,      NULL
capmin,   NULL, Tech,     NULL
DTP9,     CSS,  Kochi,    NULL
NULL,     ESS,  TVM,      NULL
sree0,    RSS,  MNGL,     NULL

我写了以下代码:

f=open('csvfile2.csv', 'r')
g=csv.reader(f)
first=next(g, None)
print('lenght of first list', len(first))
f1=open('csvfile1.csv','r')
h=csv.reader(f1)
second=next(h,None)
print('lenght of first list', len(second))
f2=open('csvfile3', 'w')
writer=csv.writer(f2)
count=0
if len(second) < len(first):
    for i in first:
        if not i in second:
            for count in range:
                writer.writerows('Null')
                print('null')
        else:
            ind=second.index(i)
                for j in second:
                    if not j[ind]:
                        writer.writerows(j[ind].replace(' ','Null'))                            
                    else:
                        writer.writerows(j[ind])

以上代码的输出:

name, Dept, City, Address
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L
N
U
L
L

我已经在AWS EC2实例上编写了代码,并计划将其也用于AWS Lambda。

2 个答案:

答案 0 :(得分:0)

直接在csv文件上操作可能非常困难。我建议使用可与表格数据结构一起使用的熊猫,因为它非常有效并且可以最大限度地减少代码。

示例代码:

import pandas as pd 

# READ BOTH FILES AS TABULAR DATA STRUCTURE
# PROVIDE FILE PATH
csv_file_one = pd.read_csv("csv_file_one.csv")
csv_file_two = pd.read_csv("csv_file_two.csv")

# REPLACE EMPTY VALUES WITH NULL IN CSV ONE
csv_file_one.fillna(value='NULL', inplace=True)

header_of_csv_two = list(csv_file_two.columns.values)

# IF CSV FILE ONE DOESN'T HAS COLUMN
# OF CSV TWO, THEN IT WILL CREATE THAT
# WITH NULL VALUES
for each_col in header_of_csv_two:
    if each_col not in csv_file_one.columns:
        csv_file_one[each_col] = 'NULL'

# WRITING TO CSV
# PROVIDE FILE PATH
csv_file_one.to_csv("csv_file_three.csv", index=False)

示例输出:

name      dept       city  address
ram        NULL  kathmandu    NULL
kiran  computer       NULL    NULL
kumar     civil      patan    NULL

答案 1 :(得分:0)

如果您不想使用pandas,以下是使用csv的解决方案:

import csv

with open("csvfile1.csv") as csv_1,\
     open("csvfile2.csv") as csv_2,\
     open("csvfile3.csv", "w") as csv_3:

    reader_1 = csv.reader(csv_1)
    reader_2 = csv.reader(csv_2)
    writer = csv.writer(csv_3)

    headers_1 = next(reader_1)
    headers_2 = next(reader_2)

    insert_null_at = []
    for i, header in enumerate(headers_2):
        if header not in headers_1:
            insert_null_at.append(i)

    writer.writerow(headers_2)
    for row in reader_1:
        for i in insert_null_at:
            row.insert(i, "")

        writer.writerow([item if item != "" else "NULL" for item in row])

如上所述,这假设csvfile1的格式实际上是:

name,Dept,City
sree,CSE,Bengaluru
vatsasa,ECE,Hyd
,IT,VJA
capini,Mech,TPTY
DTP,Civil,kandra
Bengaluru,ECM,TVM
sre,ECS,MNGL
vatsas,,Kochi
,Nano,TVM
capmin,,Tech
DTP9,CSS,Kochi
,ESS,TVM
sree0,RSS,MNGL