使用两个CSV文件进行CSV查找并生成新文件

时间:2016-11-03 16:49:58

标签: python mysql csv

我对蟒蛇世界非常陌生,我有一个问题,我不知道如何面对。我关注的表的示例可在以下链接中找到:https://i.stack.imgur.com/mm1it.png

我有一个csv文件,其中包含一个类似于t1(来自链接)的表,充当数据库(这只是整个数据库的一个片段)。

我有另一个表(t2),我想充当我的搜索参数。

我想创建一个查找和返回程序,它将使用t2作为查看t1并为我提供输出csv文件的方法,该文件为我提供所有A参数以及替换species_a,species_b和在t2中为其分配了数值的species_c。如果species_c中没有值,那么它应该返回-1。决赛桌看起来与此类似:t3。

这种令人费解的做法的原因是我已经积累了一个数据库,其格式与我正在使用的软件不同。我不能简单地将数据库离子更改为t2中的数字离子,因为每次我使用我的软件开始新的运行时,基于我在系统中考虑的材料来分配数字。 / p>

3 个答案:

答案 0 :(得分:1)

我认为你应该看一下熊猫图书馆。

http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table

有更简洁的方法可以做到这一点(您可以创建一个字典并将列中的值重新映射为ID),但是您的问题似乎是关于加入数据,所以这里有一个示例如何pandas可以加入.csv文件:

df1 = pd.read_csv('../path/t1.csv')
df2 = pd.read_csv('../path/t2.csv')

combined = pd.merge(df1, df2, how='left', left_on='species_a', right_on='aq_species')

combined = pd.merge(combined, df2, how='left', left_on='species_b', right_on='aq_species')

combined = pd.merge(combined, df3, how='left', left_on='species_c', right_on='aq_species')

#this will output 3 ion_id columns which you can then rename

combined.rename(columns={'ion_id' : 'species_a_id', 'ion_id_x' : 'species_b_id', 'ion_id_y' : 'species_c_id'}, inplace=True)

combined.to_csv('../path/t3.csv', index=False)

答案 1 :(得分:0)

这是未经测试的,因为我没有您的CSV文件。 csv模块是你的朋友。

import sys
import csv

# Create a map of species to ion ID
species_map = {}
with open('t2.csv') as fin:
    reader = csv.reader(fin)
    for row in reader:
        species_map[row[1]] = row[0]

# Write the output mapping species names to IDs
fieldnames = ['species_a', 'species_b', 'species_c', 'A0',  'A1',  'A2',  'A3',  'A4',  'A5', 'interaction_type']
writer = csv.DictWriter(sys.stdout, fieldnames)
writer.writeheader()
with open('t1.csv') as fin:
    reader = csv.DictReader(fin)
    for row in reader:
        row['species_a'] = species_map.get(row['species_a'], -1)
        row['species_b'] = species_map.get(row['species_b'], -1)
        row['species_c'] = species_map.get(row['species_c'], -1)
        writer.writerow(row)

答案 2 :(得分:0)

我能够用更长的代码来解决这个问题:

import csv
import itertools
from itertools import izip

def all_species_combination():
        'Creates all possible ion combinations so that they can compare against the database'
        with open('t2', 'rb') as f:
            reader = csv.reader(f)
            # to replace empty spacies within list entries
            global combined
            text = []
            species_a = []
            species_b = []
            species_c = []
            combined = []
            # obtaining three lists of the aqueous species so that they can be combined later
            for row in reader:
                species_a.append(row[1])
                species_b.append(row[1])
                species_c.append(row[1])


            # Iteratively combines the three separate species lists to provide a combined list
            list(itertools.product(species_a, species_b, species_c))
            for x in itertools.product(species_a, species_b, species_c):
                combined.append(x)

def lookup_ion_combos():
        'Checks the database csv file whether potential database entries are available'
        with open('t1', 'rb') as f:
            reader = csv.reader(f)
            global species_lookup
            species_lookup = []
            pitzer_id = []
            species_a = []
            species_b = []
            species_c = []

            for row in combined:
                species_a.append(row[0])
                species_b.append(row[1])
                species_c.append(row[2])
            # Iteratively goes through every row of ion combinations and compares to t1.
            for row in reader:
                for i in range(len(species_a)):
                    if row[0] == species_a[i]:
                        if row[1] == species_b[i]: 
                                if row[2] == species_c[i]: 
                                        species_lookup.append(row[0:11])
def gems_ions():
'Creates dictionary of t2 to replace corresponding species'
with open(‘t2.csv’, 'rb') as g:
    reader = csv.reader(g)
    global ion_id, atom, dictionary, zero_dict
    ion_id = []
    atom = []

    for row in reader:
        ion_id.append(row[0])
        atom.append(row[1])

    ion_id = [w.replace(' ','').replace('@','') for w in ion_id]
    atom = [w.replace(' ','').replace('@','') for w in atom]

    dictionary = dict(zip(atom, ion_id))

DATA = {"records": [dictionary]}
for name, datalist in DATA.iteritems():  # Or items() in Python 3.x
    for datadict in datalist:
        for key, value in datadict.items():
            if value == '0':
                datadict[key] = 'zero'

zero_dict = {'zero':'0'}

all_species_combination()
lookup_ion_combos()
gems_ions()

这使我能够生成一个新阵列,其中所有被查找的离子都链接到它们各自的值。然后我创建了一个字典,将它们转换为基于t2的数值。

感谢您的帮助!