Question

我有两个需要连接的csv文件。我在两个csv文件中读取了pandas dfs。一个有col标签，另一个没有。我将标签添加到需要它们的df，然后连接两个dfs。连接工作正常，但我添加的标签看起来像个别列表或其他东西。我无法弄清楚python正在做什么，特别是当你打印标签和df时，它看起来都很好。称之为这种方法。

我能够通过在读取csv时将col标签添加到csv来解决问题。然后它工作正常。称这种方法为二。方法一是怎么回事？

以下代码和结果。

接近一个

#read in the vectors as a pandas df vec
vecs=pd.read_csv(os.path.join(path,filename), header=None)

#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]
vecs.columns = [vecnames]
print('\nvecnames')
display(vecnames)  #they look ok here
display(vecs.head()) #they look ok here

#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))
print('\nrecipes file - ids and recipe phrases')
display(recipes.head())

test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors look like lists!')
display(test.head())

方法一的结果：

 ['v1',
 'v2',
 'v3',
 'v4',
 'v5',
 'v6',
 'v7',
 'v8',
 'v9',
 'v10',
 'v11',
 'v12',
 'v13',
 'v14',
 'v15',
 'v16',
 'v17',
 'v18',
 'v19',
 'v20',
 'v21',
 'v22',
 'v23',
 'v24',
 'v25']

接近两个

当我读取未标记的文件时，通过将col标签添加到csv，它可以正常工作。为什么呢？

#label the feature vectors v1-vn and attach to the df
endrange=features+1
string='v'
vecnames=[string + str(i) for i in range(1,endrange)]

#read in the vectors as a pandas df and label the cols
vecs=pd.read_csv(os.path.join(path,filename), names=vecnames, header=None)

#read in the IDs and phrases as a pandas df
recipes=pd.read_csv(os.path.join(path,'2a_2d_id_all_recipe_forms.csv'))

test=pd.concat([recipes, vecs], axis=1)
print('\ncol labels for vectors as expected')
display(test.head())

方法二的结果

Answer 1

奇怪的行为来自这一行：

vecs.columns = [vecnames]

vecnames 已经一个列表，但上面的行将其包装在另一个列表中。打印DataFrame时列名称会正确显示，但将vecs与另一个DataFrame连接会导致pandas将vecs的列名解包为单元素元组。

修复：将上述行更改为：

vecs.columns = vecnames

按原样运行其他所有内容。

为pandas df添加标签，然后将df连接到另一个df - 现在标签是一个列表 - 给出了什么？

1 个答案: