dictionary生成空键/值

时间:2017-01-13 01:09:32

标签: python-2.7 csv dictionary

我正在尝试使用其他参考文件在.csv文件上映射行值。最初的.csv看起来像这样:

PROBE,8988,8981,8878,8983
1371844,0.011,-0.018,-0.032,-0.034
1386013,0.034,0.225,-0.402,0.418
1390154,0.145,-0.108,-0.421,-0.048
1393851,-0.146,-0.026,-0.101,-0.011

我用来制作字典的参考.csv如下所示:

PROBE, Title, Gene
1390154, Cellular, Becn1
1371844, Liver, Vcp
1393851, Kidney, Lypla2
1386013, Heart, Ube2d2

理想情况下,我可以做到这一点:

PROBE   8988   8981   8878   8983
Vcp  0.011 -0.018 -0.032 -0.034
Ube2d2  0.034  0.225 -0.402  0.418
Becn1  0.145 -0.108 -0.421 -0.048
Lypla2 -0.146 -0.026 -0.101 -0.01

这就是我的尝试:

import csv
import pandas as pd

reader = csv.reader(open('C:\Users\Troy\Documents\ExPSID.csv')) #Open reference .csv file     
result = {}
for row in reader:
    key = row[0]
    result[key] = row[2]
    dict = result #Configure dictionary

df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt', index_col=0) #Fetch unmapped .csv
df.replace({"PROBE": dict}) #Use dictionary to map Id's to genes

它抛出“ValueError:重叠键和值不允许替换”。

然而,我知道为什么会这样,因为如果我打印字典,我得到:

{'': '', ' ': '', '1390154': 'Becn1', '1386013': 'Ube2d2', 'Probe  ': 'Gene', '1371844': 'Vcp', '1393851': 'Lypla2'}

它在我的字典前面加上两个空键:值集。如果我手动删除它们,df.replace({“PROBE”:dict})工作正常,一切都很好。

所以我的问题是,有没有办法可以改变这个脚本,这样我就不必手动删除前置键:值集了?有没有更好的方法来做到这一点?

我在Python上看起来很新,所以如果这是一个愚蠢的问题,我很乐意拥有它:P

P.S。:如果我也想要映射列,请使用另一个参考.csv,如下所示:

Experiment, Array, Drug
8983, Genechip, Famotidine
8878, Microarray, Dicyclomine
8988, Genechip, Etidronate
8981, Microarray, flunarizine

我可以简单地将上述代码中的“行”替换为“col”吗?当我尝试这样做时,它只是吐出原始文件而没有映射新值....

我感谢大家的帮助!

1 个答案:

答案 0 :(得分:0)

  import pandas as pd
  If i understood you correctly you want to achieve something like this from the two sets you have:

         8988   8981   8878   8983
PROBE                             
Vcp     0.011 -0.018 -0.032 -0.034
Ube2d2  0.034  0.225 -0.402  0.418
Becn1   0.145 -0.108 -0.421 -0.048
Lypla2 -0.146 -0.026 -0.101 -0.011

pandas merge() function can help you achieve what you want: 

df1 = pd.read_csv('{path_to_original}/org.csv')
df2 = pd.read_csv('{path_to_reference}/reference.csv', delimiter=', ', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Gene').drop(['PROBE', 'Title'], axis=1)
df4.index.name = 'PROBE'
print(df4)

If you take a look at your reference file that has space after delimiter comma i.e why it is mentioned as delimiter ', ' while reading the csv.