pandas.DataFrame.apply ValueError:操作数无法与形状一起广播

时间:2017-04-13 15:49:36

标签: python-3.x pandas gensim word2vec

使用循环逐行执行此函数。使用pandas.DataFrame.apply执行相同的函数会返回ValueError:操作数无法与形状一起广播。 pandas.DataFrame.apply应该有效吗?如果它是其中一个不容易解释的事情,那么关于如何加速处理(除多处理之外)的任何想法都会?

#python 3.6
import pandas as pd # version 0.19.2  
import numpy as np  # 
#gensim version 1.0.1
from gensim import models #https://radimrehurek.com/gensim/models/word2vec.html

df=pd.DataFrame({"q1":[['how', 'I', 'from', 'iPhone', 'keep', 'them', 'my', 'but', 'delete', 'iCloud', 'photos', 'in', 'can'],
                   ['use', 'are', 'radio', 'What', 'commercial', 'cognitive', 'technology', 'in'],
                   ['how', 'I', 'razor', 'prevent', 'burns', 'the', 'stomach', 'on', 'can']],
             "q2":[['Can', 'remove', 'from', 'I', 'iPhone', 'removing', 'them', 'my', 'storage', 'photos', 'iCloud', 'without'],
                  ['radio', 'from', 'Where', 'do', 'come', 'cognitive', 'distinction'],
                   ['how', 'I', 'razor', 'prevent', 'can', 'burn']]})

#using pretrained model https://code.google.com/archive/p/word2vec/
w2v = models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True) 

#This works
df['w2v_sim']=np.nan
for i in range(len(df)):       
df['w2v_sim'].ix[i]=w2v.n_similarity(df['q1'].ix[i],df['q2'].ix[i])
print(str(df['w2v_sim'].ix[i]))

#this doesn't work
df['w2v_sim']=np.nan
df['w2v_sim']=df.apply(w2v.n_similarity(df['q1'],df['q2']),axis=1)

ValueError:操作数无法与形状(13,300)(8,300)

一起广播

谢谢

1 个答案:

答案 0 :(得分:0)

考虑到预训练模型是1.5 GB,这很难重现,但我认为这是因为你的应用,当使用axis = 1调用时 - 通过将函数逐行应用到数据帧。所以它应该只采用一个参数(行,这是一个系列)。试试这个:

N=16; %number of nodes, There is one sink node in addition to these N nodes.
AreaMax=200;
xyNode=randi([1,200],N,2);
xyNode2=[xyNode;[100*sqrt(N)/2,100*sqrt(N)/2]]; % Adds the sink node's coordination.
ID=[1:N+1]';
nodes=[ID,xyNode2(:,1),xyNode2(:,2)];
[testt segments]=q43223243(N+1, 100*sqrt(5), 200); % Finds segments, which is an input needed in dijkstra algorithm.
%% Find the distance and path between nodes and the sink using CSDG_Dijkstra_SPT algorithm 
     start_id=n;
     finish_id=N+1;
      [distance,path] = dijkstra(nodes,segments,start_id,finish_id);