Python:如何将数据帧列作为参数传递给函数?

时间:2021-04-04 11:32:44

标签: python pandas dataframe nlp bert-language-model

我有一个数据框 df,其中包含 2 列文本嵌入,即 embedding_1embedding_2。我想在 df 中创建名为 distances 的第三列,它应该包含 embedding_1embedding_2 的每一行之间的 cosine_similarity。

但是当我尝试使用以下代码实现这一点时,我得到了一个 ValueError

如何解决?

数据框 df

           embedding_1              |            embedding_2                                 
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.49163356, -0.4877703,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.06686627, -0.75147504...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.42776933, -0.88310856,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.6520882, -1.049325,...]]
 [[-0.28876397, -0.6367827, ...]]   |  [[-1.4216679, -0.8930428,...]]

计算余弦相似度的代码

df['distances'] = cosine_similarity(df['embeddings_1'], df['embeddings_2'])

错误

ValueError: setting an array element with a sequence.

必需的数据框

       embedding_1              |            embedding_2                 |  distances                        
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.49163356, -0.4877703,...]]   |    0.427
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.06686627, -0.75147504...]]   |    0.673
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.42776933, -0.88310856,...]]  |    0.882
 [[-0.28876397, -0.6367827, ...]]   |  [[-0.6520882, -1.049325,...]]     |    0.665
 [[-0.28876397, -0.6367827, ...]]   |  [[-1.4216679, -0.8930428,...]]    |    0.312

1 个答案:

答案 0 :(得分:2)

您可以使用 RewriteEngine on RewriteCond %{HTTPS} !on RewriteCond %{SERVER_PORT} ^80$ RewriteCond %{HTTP_HOST} ^subdomain\.example\.com$ [NC] RewriteRule .? https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] 在每一行使用 apply()

cosine_similarity()

或一个班轮

def cal_cosine_similarity(row):
    return cosine_similarity(row['embeddings_1'], row['embeddings_2'])

df['distances'] = df.apply(cal_cosine_similarity, axis=1)