我有一个像这样的pandas数据框(简化):
data = {'old': [['these','are','old','tokens'],
['here','are','some','more','old']], 'new':
[['and','these','are','new'],['see','the','difference','between','them']]}
example_df = pd.DataFrame(data=data).astype(str)
所以数据框看起来像这样:
new
0 ['and', 'these', 'are', 'new']
1 ['see', 'the', 'difference', 'between', 'them']
old
0 ['these', 'are', 'old', 'tokens']
1 ['here', 'are', 'some', 'more', 'old']
在我的真实df中,有968行。 (这在下面变得相关)
我正在执行比较功能(用于语义分析),再次简化:
def analysis(1st_token_list,2nd_token_list):
synonymset1 = somefunction(1st_token_list) # specifics don't matter, this works fine
synonymset2 = somefunction(2nd_token_list) # specifics don't matter, this works fine
best_score_list = []
for synset in synonymset1:
similaritylist = [synset.path_similarity(ss) for ss in synonymset2 if synset.path_similarity(ss) is not None]
if not similaritylist:
continue;
best_score = max(similaritylist)
if best_score is not None:
best_score_list.append(best_score)
print(best_score_list)
return best_score_list
为了更加清晰,循环之前的函数返回每个标记列表的同义词列表(来自wordnet),如下所示:
[Synset('old.v.01'), Synset('token.n.01')]
当我打电话给下面的时候,
notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row:
maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)
我看到正在生成的列表(我发现有关形状不合适的错误。
Traceback (most recent call last):
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4637, in create_block_manager_from_arrays
blocks = form_blocks(arrays, names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4701, in form_blocks
float_blocks = _multi_blockify(float_items)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4778, in _multi_blockify
values, placement = _stack_arrays(list(tup_block), dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4823, in _stack_arrays
stacked[i] = _asarray_compat(arr)
ValueError: could not broadcast input array from shape (6) into shape (5)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "semsim_calculation.py", line 133, in <module>
notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row: maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
result = self._constructor(data=results, index=index)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 330, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 461, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4608, in construction_error
passed, implied))
ValueError: Shape of passed values is (968, 5), indices imply (968, 11)
任何人都可以解释为什么会这样吗? print()
实际上确实向我显示正在生成值列表([0.25, 0.5, 0.07692307692307693]
),但它不是return
该列表(类似问题已在{{3}中提出但未解决}。