Python Pandas计算列没有RETURN列表,但打印列表

时间:2018-02-06 18:19:40

标签: python list pandas dataframe calculated-columns

我有一个像这样的pandas数据框(简化):

data = {'old': [['these','are','old','tokens'],
['here','are','some','more','old']], 'new': 
[['and','these','are','new'],['see','the','difference','between','them']]}

example_df = pd.DataFrame(data=data).astype(str)

所以数据框看起来像这样:

                                              new  
0                   ['and', 'these', 'are', 'new']
1  ['see', 'the', 'difference', 'between', 'them']

                                      old
0       ['these', 'are', 'old', 'tokens']
 1  ['here', 'are', 'some', 'more', 'old']

在我的真实df中,有968行。 (这在下面变得相关)

我正在执行比较功能(用于语义分析),再次简化:

def analysis(1st_token_list,2nd_token_list):
    synonymset1 = somefunction(1st_token_list) # specifics don't matter, this works fine
    synonymset2 = somefunction(2nd_token_list) # specifics don't matter, this works fine

    best_score_list = []

for synset in synonymset1:
    similaritylist = [synset.path_similarity(ss) for ss in synonymset2 if synset.path_similarity(ss) is not None]
    if not similaritylist:
        continue;
    best_score = max(similaritylist)

    if best_score is not None: 
        best_score_list.append(best_score)
        print(best_score_list)

return best_score_list

为了更加清晰,循环之前的函数返回每个标记列表的同义词列表(来自wordnet),如下所示:

[Synset('old.v.01'), Synset('token.n.01')]

当我打电话给下面的时候,

notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row: 
maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)

我看到正在生成的列表(我发现有关形状不合适的错误。

Traceback (most recent call last):
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4637, in create_block_manager_from_arrays
    blocks = form_blocks(arrays, names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4701, in form_blocks
    float_blocks = _multi_blockify(float_items)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4778, in _multi_blockify
    values, placement = _stack_arrays(list(tup_block), dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4823, in _stack_arrays
    stacked[i] = _asarray_compat(arr)
ValueError: could not broadcast input array from shape (6) into shape (5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "semsim_calculation.py", line 133, in <module>
    notnull_df['maxsim_OtN'] = notnull_df.apply(lambda row: maxsim.word_similarity(row['old_tokens'], row['new_tokens']), axis=1)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4877, in apply
    ignore_failures=ignore_failures)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 4990, in _apply_standard
    result = self._constructor(data=results, index=index)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 330, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 461, in _init_dict
    return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/frame.py", line 6173, in _arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4642, in create_block_manager_from_arrays
    construction_error(len(arrays), arrays[0].shape, axes, e)
File "/Users/anon/venv_lda/lib/python3.5/site-packages/pandas/core/internals.py", line 4608, in construction_error
    passed, implied))
ValueError: Shape of passed values is (968, 5), indices imply (968, 11)

任何人都可以解释为什么会这样吗? print()实际上确实向我显示正在生成值列表([0.25, 0.5, 0.07692307692307693]),但它不是return该列表(类似问题已在{{3}中提出但未解决}。

0 个答案:

没有答案