我有两个需要连接的csv文件。我在两个csv文件中读取了pandas dfs。一个有col标签,另一个没有。我将标签添加到需要它们的df,然后连接两个dfs。连接工作正常,但我添加的标签看起来像个别列表或其他东西。我无法弄清楚python正在做什么,特别是当你打印标签和df时,它看起来都很好。称之为这种方法。
我能够通过在读取csv时将col标签添加到csv来解决问题。然后它工作正常。称这种方法为二。方法一是怎么回事?
以下代码和结果。
接近一个
#read in the vectors as a pandas df vec
vecs=pd.read_csv(os.path.join(path,filename), header=None)
#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]
vecs.columns = [vecnames]
print('\nvecnames')
display(vecnames) #they look ok here
display(vecs.head()) #they look ok here
#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))
print('\nrecipes file - ids and recipe phrases')
display(recipes.head())
test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors look like lists!')
display(test.head())
方法一的结果:
['v1',
'v2',
'v3',
'v4',
'v5',
'v6',
'v7',
'v8',
'v9',
'v10',
'v11',
'v12',
'v13',
'v14',
'v15',
'v16',
'v17',
'v18',
'v19',
'v20',
'v21',
'v22',
'v23',
'v24',
'v25']
接近两个
当我读取未标记的文件时,通过将col标签添加到csv,它可以正常工作。为什么呢?
#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]
#read in the vectors as a pandas df and label the cols
vecs=pd.read_csv(os.path.join(path,filename), names=vecnames, header=None)
#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))
test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors as expected')
display(test.head())
方法二的结果
答案 0 :(得分:1)
奇怪的行为来自这一行:
vecs.columns = [vecnames]
vecnames
已经一个列表,但上面的行将其包装在另一个列表中。打印DataFrame时列名称会正确显示,但将vecs
与另一个DataFrame连接会导致pandas将vecs
的列名解包为单元素元组。
修复:将上述行更改为:
vecs.columns = vecnames
按原样运行其他所有内容。