加入两个csv文件

时间:2017-05-19 05:29:00

标签: python linux csv join awk

csvfile1

status,longitude,latitude,timestamp    
ok,10.12,17.45,14569003    
ok,11.34,16.78,14569000

csvfile2

weather,timestamp,latitude1,longitude1,latitude2,longitude2
rainy,14569003,17.45,10.12,17.50,11.25    
sunny,14569000,13.76,12.44,16.78,11.34

预期产出

status,weather,longitude,latitude,timestamp    
ok,rainy,10.12,17.45,14569003    
ok,sunny,11.34,16.78,14569000    

我想结合两个文件的经度,纬度和时间戳列。

csvfile2中有两个经度和两个纬度。所以我想比较它是否匹配任何一个经度 - 纬度对以及时间戳。

两个文件中的列名顺序也不同。

任何帮助都将不胜感激。

谢谢。

3 个答案:

答案 0 :(得分:3)

你可以使用它。

import pandas as pd

first = pd.read_csv('csvfile1.csv')
second = pd.read_csv('csvfile2.csv')

merged = pd.merge(first, second, how='left', on='what you want(it can be label or a list)')
merged.to_csv('merged.csv', index=False)

有关详细信息,您可以看到这些link1 link2 两者都很有用。

答案 1 :(得分:1)

awk 解决方案:

join_csv.awk 脚本:

april_final0.to_excel(writer, sheet_name='BDM')
enter code here`workbook  = writer.book
worksheet = writer.sheets['BDM']
#Reference http://xlsxwriter.readthedocs.io/format.html#format
#format = workbook.add_format()
#format.set_bold()
#format.set_font_color('red')
format = workbook.add_format({'bold': True, 'bg_color': 'red'})
format1 = workbook.add_format({'bold': True, 'bg_color': 'yellow'})
format2 = workbook.add_format({'bold': True, 'bg_color': 'green'})
#worksheet.conditional_format('G2:G33', {'type': '3_color_scale'})
worksheet.conditional_format('G2:G233', {'type':     'cell',
                                'criteria': '<',
                                'value':     88,
                                'format':  format})
worksheet.conditional_format('G2:G233', {'type':     'cell',
                                'criteria': '<=',
                                'value':     92.4,
                                'format':  format1})
worksheet.conditional_format('G2:G233', {'type':     'cell',
                                'criteria': '>=',
                                'value':     92.5,
                                'format':  format2})
writer.save()

<强> 用法

#!/bin/awk -f
BEGIN {
    FS=OFS=",";   # field separator
    print "status,weather,longitude,latitude,timestamp"  # header line
}
NR==FNR && NR>1 {          # processing the first file
    a[$4]=$1 FS $2 FS $3   # accumulating the needed values (status, longitude, latitude) 
}
FNR>1 {                    # processing the second file
    if ($2 in a) {         # if `timestamp` matches                                                                                                                                             
        split(a[$2],data,FS);  # extracting items for further comparison
        if ((data[2]==$4 || data[2]==$6) && (data[3]==$3 || data[3]==$5)) {
            print data[1],$1,data[2],data[3],$2
        }
    }
}

输出:

awk -f join_csv.awk file1 file2

答案 2 :(得分:0)

希望这个答案能帮到你:

review

得到输出:

import csv
file1 = open("csvfile1.csv", "r")
file2 = open("csvfile2.csv", "r")

file1_dict = csv.DictReader(file1)
file2_dict = csv.DictReader(file2)

new_file = open("new_file.csv", "w")
csv_writer = csv.writer(new_file)
csv_writer.writerow(["status", "weather", "longitude", "latitude", "timestamp"])
for f1_row, f2_row in zip(file1_dict, file2_dict):
    f1_row, f2_row = dict(f1_row), dict(f2_row) # In python2 no need to convert to dict
    if f1_row["timestamp"] == f2_row["timestamp"]: #Here write the condition to check your latitude and longitude also.
        csv_writer.writerow([f1_row["status"], f2_row["weather"], f1_row["longitude"],  f1_row["latitude"],  f1_row["timestamp"]])

file1.close()
file2.close()
new_file.close()