我有一个如下所示的参考文件:
Experiment,Array,Drug
8983,Genechip,Famotidine
8878,Microarray,Dicyclomine
8988,Genechip,Etidronate
8981,Microarray,Flunarizine
我成功创建了一个字典,使用以下内容将Experiment
号码映射到Drug
名称:
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
#Configure dictionary
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
我想将此字典映射到另一个文件的标题,该文件由实验编号组成。它目前看起来像这样:
Gene,8988,8981,8878,8983
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
但它应该是这样的:
Gene,Etidronate,Flunarizine,Dicyclomine,Famotidine
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
我尝试使用:
import csv
import pandas as pd
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')
df['row[0]'].replace(di, inplace=True)
但它返回了KeyError: 'row[0]'
。
我也尝试了以下内容,甚至是为了合并而进行转置:
import pandas as pd
df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt',).transpose()
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', delimiter=',', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Drug').drop(['Experiment', 'Array'], axis=1)
df4.index.name = 'Drug'
print df4
这次收到MergeError('No common columns to perform merge on')
。
是否有更简单的方法将我的字典映射到可行的标题?
答案 0 :(得分:2)
要记住的一件事是确保映射器字典对应的keys
以及映射到的标头具有相同的数据类型。
这里,一个是字符串,另一个是整数类型。因此,在阅读本身时,我们会通过将dtype
设置为引用str
来DF
来解释df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt') # Original
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', dtype=str) # Reference
。
DF
将原始DF
的列转换为其系列表示形式,然后将实验编号的旧值替换为从中检索到的新药物名称参考df1.columns = df1.columns.to_series().replace(df2.set_index('Experiment').Drug)
df1
。
form
答案 1 :(得分:1)
我使用csv
作为整个脚本。这会修复您想要的标题并保存到新文件中。如果您喜欢的话,新文件名可以替换为相同的文件名。这个程序是用python3编写的。
import csv
with open('sample.txt', 'r') as ref:
reader = csv.reader(ref)
# skip header line
next(reader)
# make dictionary
di = dict([(row[0], row[2]) for row in reader])
data = []
with open('sample1.txt', 'r') as df:
reader = csv.reader(df)
header = next(reader)
new_header = [header[0]] + [di[i] for i in header if i in di]
data = list(reader)
# used to make new file, can also replace with the same file name
with open('new_sample1.txt', 'w') as df_new:
writer = csv.writer(df_new)
writer.writerow(new_header)
writer.writerows(data)