Question

我目前正在进行涉及文本挖掘的数据分析项目。截至目前，我一直在过滤掉某些短语。

假设我有这个标记化的单词数组

arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld']

（你好，你今天过得怎么样？#HelloWorld）

我想从句子中移除#HelloWorld。

我的原始逻辑遍历数组并检查#，一旦找到#，我将替换#和{{1之后的元素如下所示的空格：

不幸的是，我在第5行收到错误N = 0 for index to arr: if arr[N] == '#': arr[N] = (' ') arr[N+1] = (' ') N += 1。我尝试使用list assignment index out of range，但它只允许在.append()进行修改。

还有其他方法吗？

Answer 1

这应该有效，就像其他人说的那样，你需要检查你何时在列表的末尾。

编辑：简化！

arr = ['a', 'b', '#', 'aa']
indices = [idx for idx, elt in enumerate(arr) if elt == '#']

for idx in indices:
    if idx != len(arr): arr[idx+1] = ' '  # Check if not at the end of the list
    arr[idx] = ' '

Answer 2

当最后一个元素为#时，您的代码将尝试访问数组外部，因此您需要检查它。

也没有必要使用单独的变量进行迭代和索引，只需迭代索引范围。

for i in range(len(arr)):
    if arr[i] == '#':
        arr[i] = ' '
        if i < len(arr)-2:
            arr[i+1] = ' '

Answer 3

您的代码的根本原因是＆＃39; N + 1＆＃39;当循环到列表末尾时，它将超出范围。

如果一个元素必须存在于一个＆＃39;＃＆＃39;之后，请尝试以下：

arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld']

for index in range(0, len(arr)):
    if arr[index] == '#':
        arr[index:index+2] = ['', '']
print (arr)

输出：

['hello,', 'how', 'is', 'your', 'day', 'going', '?', '', '']
[Finished in 0.133s]

如果数组以＆＃39;＃＆＃39;结尾，它仍然会替换＃＆＃39;＃＆＃39;与['','']（我不确定这个结果是否符合您的预期。

arr = ['hello' ',' , 'how', 'is' , 'your', 'day', 'going', '?' , '#', 'HelloWorld', '#']

for index in range(0, len(arr)):
    if arr[index] == '#':
        arr[index:index+2] = ['', '']
print (arr)

输出：

['hello,', 'how', 'is', 'your', 'day', 'going', '?', '', '', '', '']
[Finished in 0.179s]

在第N次迭代中修改数组中的第N + 1个元素？

3 个答案: