为pandas df添加标签,然后将df连接到另一个df - 现在标签是一个列表 - 给出了什么?

时间:2018-03-31 00:31:31

标签: pandas dataframe concatenation

我有两个需要连接的csv文件。我在两个csv文件中读取了pandas dfs。一个有col标签,另一个没有。我将标签添加到需要它们的df,然后连接两个dfs。连接工作正常,但我添加的标签看起来像个别列表或其他东西。我无法弄清楚python正在做什么,特别是当你打印标签和df时,它看起来都很好。称之为这种方法。

我能够通过在读取csv时将col标签添加到csv来解决问题。然后它工作正常。称这种方法为二。方法一是怎么回事?

以下代码和结果。

接近一个

#read in the vectors as a pandas df vec
vecs=pd.read_csv(os.path.join(path,filename), header=None)

#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]
vecs.columns = [vecnames]
print('\nvecnames')
display(vecnames)  #they look ok here
display(vecs.head()) #they look ok here

#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))
print('\nrecipes file - ids and recipe phrases')
display(recipes.head())

test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors look like lists!')
display(test.head())

方法一的结果

 ['v1',
 'v2',
 'v3',
 'v4',
 'v5',
 'v6',
 'v7',
 'v8',
 'v9',
 'v10',
 'v11',
 'v12',
 'v13',
 'v14',
 'v15',
 'v16',
 'v17',
 'v18',
 'v19',
 'v20',
 'v21',
 'v22',
 'v23',
 'v24',
 'v25']

enter image description here

接近两个

当我读取未标记的文件时,通过将col标签添加到csv,它可以正常工作。为什么呢?

#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]

#read in the vectors as a pandas df and label the cols
vecs=pd.read_csv(os.path.join(path,filename), names=vecnames, header=None)

#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))

test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors as expected')
display(test.head())

方法二的结果

enter image description here

1 个答案:

答案 0 :(得分:1)

奇怪的行为来自这一行:

vecs.columns = [vecnames]

vecnames 已经一个列表,但上面的行将其包装在另一个列表中。打印DataFrame时列名称会正确显示,但将vecs与另一个DataFrame连接会导致pandas将vecs的列名解包为单元素元组。

修复:将上述行更改为:

vecs.columns = vecnames

按原样运行其他所有内容。