Question

如果您能提出建议，会更好-处理项目列表时出现以下错误。我应该注意，此脚本可用于99％的项目-由于我现在将列表扩展到了8400万行，因此现在遇到了这个问题。

我对每一行都这样做

elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:

因此，如果我在处理之前正在主动检查索引是否超过一定长度，那么我看不出索引如何超出范围？

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-a28be4b396bd> in <module>()
     21     elif len(str(x)) > 4 and str(x[len(x)-2]).rstrip() in cdns:
     22       cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
---> 23     elif len(str(x)) > 5 and str(x[len(x)-3]).rstrip() in cdns:
     24       cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
     25     #if its in the TLD list, do this

IndexError: list index out of range

下面是完整的循环，所以我希望如果索引列表项超出范围，那它只是执行另一个命令并打印列表值？

  for x in index:
    #if it ends with a number, it's an IP
    if str(x)[-1].isnumeric():
      cleandomain.append(str(x[0])+'.'+str(x[1])+'.*.*')
    #if its in the CDN list, take a subdomain as well
    elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip() in cdns:
      cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+str(x[len(x)-1]))
    elif len(str(x)) > 4 and str(x[len(x)-3]).rstrip() in cdns:
      cleandomain.append(str(x[len(x)-4])+'.'+str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its in the TLD list, do this
    elif len(str(x)) > 3 and str(x[len(x)-2]).rstrip()+'.'+ str(x[len(x)-1]).rstrip() in tld:
      cleandomain.append(str(x[len(x)-3])+'.'+str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    elif len(str(x)) > 2 and str(x[len(x)-1]) in tld:
      cleandomain.append(str(x[len(x)-2])+'.'+ str(x[len(x)-1]))
    #if its not in the TLD list, do this
    else:
      cleandomain.append(x)

X生成如下：

X是列表的列表-域的拆分部分如下所示 [['news'，'bbc'，'co'，'uk']，['graph'，'facebook'，'com']]

import pandas as pd
path = "Desktop/domx.csv"
df = pd.read_csv(path, delimiter=',', header='infer', encoding = "ISO-8859-1")
df2 = df[((df['domain'] != '----'))]
df3 = df2[['domain', 'use']]
for row in df2.iterrows():
  index = df3.domain.str.split('.').tolist()

任何帮助都会很棒

Answer 1

让我继续详细介绍Corentin Limier在评论中所说的内容，因为您断然否认这是真的，而没有实际检查调试器：

基于您的原始问题错误转储：

---> cdns中的23 elif len（str（x））> 5和str（x [len（x）-3]）。rstrip（）：
IndexError：列表索引超出范围

x = ['counterexample']
print ('x =', x)
print ('length of x is', len(x))
print ('length of str(x) is', len(str(x)))

if len(str(x)) > 5:
    print ('You think this is safe')

try:
    x[len(x)-3]
except IndexError:
    print ('but it is not.')

x = ['counterexample']
  x的长度是1
  str（x）的长度是18
  你觉得这很安全
  但事实并非如此。

与x中的项目数相比，您需要知道索引是否有效。您实际上正在查看x的字符串表示形式的长度，这是完全不同的。该字符串长18个字符，但列表中只有一项。

PS：不错，我们都做到了。我的意思是，“当我们编写的代码与我们认为的完全不同时，请蒙上阴影。”这是在专业设置中进行“代码审查”的主要原因之一。

列表索引超出范围-但我在处理之前正在检查长度

1 个答案: