IndexError:字符串索引超出范围[python,scraping]

时间:2017-07-10 09:14:23

标签: python beautifulsoup list-comprehension

我正在尝试抓取一个网站,但只想将特定的行写入我的最终csv文件。当我尝试指定行时

IndexError: string index out of range.

运行此代码时,我没有收到此错误:

rows = [
["The Conservation Fund",2014,"","","","Program Services: ","$174,530,077"],
["The Conservation Fund",2014,"","","","Administration: ","$2,810,944"],
["The Conservation Fund",2014,"","","","Fundraising: ","$2,144,456"],
["The Conservation Fund",2013,"$480,674","$55,266","$0","LAWRENCE A SELZER","PRESIDENT & CEO"],
["The Conservation Fund",2013,"$369,848","$54,856","$0","RICHARD L ERDMANN","EXECUTIVE VICE PRESIDENT"],
["The Conservation Fund",2013,"$312,232","$44,386","$0","DAVID K PHILLIPS JR","EXECUTIVE VP AND CFO"],
["The Conservation Fund",2013,"$251,615","$16,125","$0","DEAN H CANNON","SENIOR VP/GENERAL COUNSEL"]]

rows1 = [x for x in rows if x[6][0] != '$']
print(rows1)

我完全得到了我的期望:

  

[['The Conservation Fund',2013,'$ 480,674','$ 55,266','$ 0',   'LAWRENCE A SELZER','PRESIDENT& CEO'],['保护基金',   2013年,'$ 369,848','$ $ 5656','$ 0','RICHARD L ERDMANN','执行   VICE PRESIDENT'],['The Conservation Fund',2013,'$ 312,232',   '$ 44,386','$ 0','DAVID K PHILLIPS JR','执行副总裁兼首席财务官',['The   自然保护基金',2013年,'$ 251,615','$ 16,125','$ 0','DEAN H.   CANNON','高级副总裁/一般顾问']]

现在当我尝试从我的刮刀运行这个类似的列表理解时(我会在这里粘贴一些代码,因为我合法地不能发布整个事情):

for page in eins:
    rows =[]
    driver.get(page)
    print("Getting {}".format(page))
    soup = BeautifulSoup(driver.page_source, "lxml")
    name = soup.find("h1", {"class" : "centered"})
    print(name.text)
    members = soup.findAll("g", { "transform" : "translate(0,0)"})
    time = soup.find("option", {"selected" : "selected"}).text
    time = int(time)
    for year in members[2:]:
        column = year.find_all("g")
        for thing in column:
            row_info = [name.text, time]
            entries = thing.find_all("text")
            if len(entries) != 5:
                row_info.extend((5 - len(entries)) * [""])
            for entry in entries:
                    row_info.append(entry.text)
            rows.append(row_info)
        time = time - 1
        rows1 = [x for x in rows if x[6][0] != "$"]

现在我突然收到以下错误代码

Traceback (most recent call last):
  File "Board_members.py", line 53, in <module>
    rows1 = [x for x in rows if x[6][0] != "$"]
  File "Board_members.py", line 53, in <listcomp>
    rows1 = [x for x in rows if x[6][0] != "$"]
IndexError: string index out of range

是两个实例中没有以相同方式格式化的行列表吗?我在这做错了什么。我尝试了一个先前使用continue函数的for循环和简单的if语句但是所有内容都归结为同样的错误。

我还是初学者,所以请原谅我脆弱的代码。我环顾四周寻找问题的答案,但如果他们在那里,我就无法理解他们。非常感谢你!

编辑:仅针对上下文,第一个实例中的行来自我使用刮刀创建的csv文件,它在csv中看起来像这样。

organization,year,compensation,other,related,name,position
The Conservation Fund,2015,,,,Total Revenue: ,"$215,096,466"
The Conservation Fund,2015,,,,Contributions: ,"$114,351,967"
The Conservation Fund,2015,,,,Gov't Grants: ,"$9,723,802"
The Conservation Fund,2015,,,,Program Services: ,"$90,762,036"
The Conservation Fund,2015,,,,Investments: ,"$220,002"
The Conservation Fund,2015,,,,Special Events: ,$0
The Conservation Fund,2015,,,,Sales: ,$0
The Conservation Fund,2015,,,,Other: ,"$38,659"
The Conservation Fund,2014,,,,Total Expenses: ,"$179,485,477"
The Conservation Fund,2014,,,,Program Services: ,"$174,530,077"
The Conservation Fund,2014,,,,Administration: ,"$2,810,944"
The Conservation Fund,2014,,,,Fundraising: ,"$2,144,456"
The Conservation Fund,2013,"$480,674","$55,266",$0,LAWRENCE A SELZER,PRESIDENT & CEO
The Conservation Fund,2013,"$369,848","$54,856",$0,RICHARD L ERDMANN,EXECUTIVE VICE PRESIDENT
The Conservation Fund,2013,"$312,232","$44,386",$0,DAVID K PHILLIPS JR,EXECUTIVE VP AND CFO

编辑2:这是我从rows1:

之前的打印行获得的输出
[['The Conservation Fund', 2015, '', '', '', 'Total Revenue: ', '$215,096,466'], ['The Conservation Fund', 2015, '', '', '', 'Contributions: ', '$114,351,967'], ['The Conservation Fund', 2015, '', '', '', "Gov't Grants: ", '$9,723,802'], ['The Conservation Fund', 2015, '', '', '', 'Program Services: ', '$90,762,036'], ['The Conservation Fund', 2015, '', '', '', 'Investments: ', '$220,002'], ['The Conservation Fund', 2015, '', '', '', 'Special Events: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Sales: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Other: ', '$38,659'], ['The Conservation Fund', 2014, '', '', '', 'Total Expenses: ', '$179,485,477'], ['The Conservation Fund', 2014, '', '', '', 'Program Services: ', '$174,530,077']]

1 个答案:

答案 0 :(得分:0)

您收到的错误是

  

IndexError:字符串索引超出范围

这意味着您正在尝试获取不存在的字符串索引。

请参阅以下示例,了解可能导致public class YandexTranslation { public static String ERROR_MESSAGE = "Can't translate!"; private String mKey; public YandexTranslation setKey(String key) { mKey = key; return this; } public Translation getTranslation(String sourceText, String sourceLang, String destinationLang) throws Throwable { String yandexUrl = "https://translate.yandex.net/api/v1.5/tr.json/translate?key=" + mKey + "&text=" + sourceText + "&lang=" + sourceLang + "-" + destinationLang; URL url = new URL(yandexUrl); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); InputStream inputStream = conn.getInputStream(); String result = convertInputStreamToString(inputStream); inputStream.close(); conn.disconnect(); return Translation.fromJson(result); } public Observable<Translation> getTranslationObservable(final String sourceText, final String sourceLang, final String destinationLang) { return Observable.create(new Observable.OnSubscribe<Translation>() { @Override public void call(Subscriber<? super Translation> subscriber) { try { if (!subscriber.isUnsubscribed()) { Translation translation = getTranslation(sourceText, sourceLang, destinationLang); subscriber.onNext(translation); subscriber.onCompleted(); } } catch (Throwable e) { subscriber.onError(e); } } }); } public Observable<String> getTextObservable(final String sourceText, final String sourceLang, final String destinationLang) { return getTranslationObservable(sourceText, sourceLang, destinationLang) .map(new Func1<Translation, String>() { @Override public String call(Translation translation) { if (Translation.hasTranslation(translation)){ return translation.translations.get(0); } return ERROR_MESSAGE; } }); } public static class Translation { public int code; public String lang; public List<String> translations; public static Translation fromJson(String json) throws JSONException { Translation translation = new Translation(); JSONObject jsonObj = new JSONObject(json); translation.code = jsonObj.getInt("code"); translation.lang = jsonObj.getString("lang"); JSONArray text = jsonObj.getJSONArray("text"); translation.translations = fromJsonArray(text); return translation; } public static boolean hasTranslation(Translation t){ return t != null && t.translations != null && t.translations.size() != 0; } } private static String convertInputStreamToString(InputStream inputStream) throws IOException { BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream)); String line = ""; String result = ""; while ((line = bufferedReader.readLine()) != null) result += line; inputStream.close(); return result; } private static List<String> fromJsonArray(JSONArray jsonArray) throws JSONException { List<String> list = new ArrayList<>(); if (jsonArray != null && jsonArray.length() != 0) { for (int i = 0; i < jsonArray.length(); i++) { String text = jsonArray.getString(i); list.add(text); } } return list; } }

的原因
IndexError: string index out of range

在你的案件中test = 'abc' test[2] # Output : c test[3] # Output : IndexError: string index out of range test1 = '' test1[0] # Output : IndexError: string index out of range test1[1] # Output : IndexError: string index out of range ; rows1 = [x for x in rows if x[6][0] != "$"]没有值或空字符串;在语句x[6]中 - 您尝试获取空字符串的0索引。

使用下面可能修复错误的代码,因为下面的代码会首先检查x[6][0]的空值,然后检查x

x[6]