Question

我有一个看起来像这样的数据集

array([['Sentence: 1', 'Thousands', 'NNS', 'O'],
       [nan, 'of', 'IN', 'O'],
       [nan, 'demonstrators', 'NNS', 'O'],
       ..., 
       ['Sentence: 2, 'To', 'TO', 'O'],
       [nan, 'the', 'DT', 'O'],
       [nan, 'attack', 'NN', 'O']], dtype=object)
       ...,
       ['Sentence: N, 'To', 'TO', 'O'],
       [nan, 'the', 'DT', 'O'],
       [nan, 'attack', 'NN', 'O']], dtype=object)

我的目标是将其转换为一个句子数组，其中每个句子都是一个单词数组。每个子数组的第一个值表示它是否是一个新句子。结果如下：

array([['Thousands', 'of', 'demonstrators', ...],
       ...
       ['To', 'the', 'attack', ...],
       ...
      ]

到目前为止，我已尝试过以下方法，这解决了问题，但我的解决方案效率低下。

result = []
current_sentence = []
for row in sentences.values:
    if isinstance(row[0], float):
        current_sentence.append(row[1])
    else:
        result.append(current_sentence)
        current_sentence = []
        current_sentence.append(row[1])
print(len(result))

感觉应该可以使用某种折叠操作，但我无法弄清楚如何在Python 3中做到这一点。在这种情况下，对我的效率是代码紧凑性和时钟时间。任何干净的想法？

在Python中有效地折叠数组

0 个答案: