我正在尝试使用其他参考文件在.csv文件上映射行值。最初的.csv看起来像这样:
PROBE,8988,8981,8878,8983
1371844,0.011,-0.018,-0.032,-0.034
1386013,0.034,0.225,-0.402,0.418
1390154,0.145,-0.108,-0.421,-0.048
1393851,-0.146,-0.026,-0.101,-0.011
我用来制作字典的参考.csv如下所示:
PROBE, Title, Gene
1390154, Cellular, Becn1
1371844, Liver, Vcp
1393851, Kidney, Lypla2
1386013, Heart, Ube2d2
理想情况下,我可以做到这一点:
PROBE 8988 8981 8878 8983
Vcp 0.011 -0.018 -0.032 -0.034
Ube2d2 0.034 0.225 -0.402 0.418
Becn1 0.145 -0.108 -0.421 -0.048
Lypla2 -0.146 -0.026 -0.101 -0.01
这就是我的尝试:
import csv
import pandas as pd
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSID.csv')) #Open reference .csv file
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
dict = result #Configure dictionary
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt', index_col=0) #Fetch unmapped .csv
df.replace({"PROBE": dict}) #Use dictionary to map Id's to genes
它抛出“ValueError:重叠键和值不允许替换”。
然而,我知道为什么会这样,因为如果我打印字典,我得到:
{'': '', ' ': '', '1390154': 'Becn1', '1386013': 'Ube2d2', 'Probe ': 'Gene', '1371844': 'Vcp', '1393851': 'Lypla2'}
它在我的字典前面加上两个空键:值集。如果我手动删除它们,df.replace({“PROBE”:dict})工作正常,一切都很好。
所以我的问题是,有没有办法可以改变这个脚本,这样我就不必手动删除前置键:值集了?有没有更好的方法来做到这一点?
我在Python上看起来很新,所以如果这是一个愚蠢的问题,我很乐意拥有它:P
P.S。:如果我也想要映射列,请使用另一个参考.csv,如下所示:
Experiment, Array, Drug
8983, Genechip, Famotidine
8878, Microarray, Dicyclomine
8988, Genechip, Etidronate
8981, Microarray, flunarizine
我可以简单地将上述代码中的“行”替换为“col”吗?当我尝试这样做时,它只是吐出原始文件而没有映射新值....
我感谢大家的帮助!
答案 0 :(得分:0)
import pandas as pd
If i understood you correctly you want to achieve something like this from the two sets you have:
8988 8981 8878 8983
PROBE
Vcp 0.011 -0.018 -0.032 -0.034
Ube2d2 0.034 0.225 -0.402 0.418
Becn1 0.145 -0.108 -0.421 -0.048
Lypla2 -0.146 -0.026 -0.101 -0.011
pandas merge() function can help you achieve what you want:
df1 = pd.read_csv('{path_to_original}/org.csv')
df2 = pd.read_csv('{path_to_reference}/reference.csv', delimiter=', ', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Gene').drop(['PROBE', 'Title'], axis=1)
df4.index.name = 'PROBE'
print(df4)
If you take a look at your reference file that has space after delimiter comma i.e why it is mentioned as delimiter ', ' while reading the csv.