我正在尝试抓取一个网站,但只想将特定的行写入我的最终csv文件。当我尝试指定行时
IndexError: string index out of range.
运行此代码时,我没有收到此错误:
rows = [
["The Conservation Fund",2014,"","","","Program Services: ","$174,530,077"],
["The Conservation Fund",2014,"","","","Administration: ","$2,810,944"],
["The Conservation Fund",2014,"","","","Fundraising: ","$2,144,456"],
["The Conservation Fund",2013,"$480,674","$55,266","$0","LAWRENCE A SELZER","PRESIDENT & CEO"],
["The Conservation Fund",2013,"$369,848","$54,856","$0","RICHARD L ERDMANN","EXECUTIVE VICE PRESIDENT"],
["The Conservation Fund",2013,"$312,232","$44,386","$0","DAVID K PHILLIPS JR","EXECUTIVE VP AND CFO"],
["The Conservation Fund",2013,"$251,615","$16,125","$0","DEAN H CANNON","SENIOR VP/GENERAL COUNSEL"]]
rows1 = [x for x in rows if x[6][0] != '$']
print(rows1)
我完全得到了我的期望:
[['The Conservation Fund',2013,'$ 480,674','$ 55,266','$ 0', 'LAWRENCE A SELZER','PRESIDENT& CEO'],['保护基金', 2013年,'$ 369,848','$ $ 5656','$ 0','RICHARD L ERDMANN','执行 VICE PRESIDENT'],['The Conservation Fund',2013,'$ 312,232', '$ 44,386','$ 0','DAVID K PHILLIPS JR','执行副总裁兼首席财务官',['The 自然保护基金',2013年,'$ 251,615','$ 16,125','$ 0','DEAN H. CANNON','高级副总裁/一般顾问']]
现在当我尝试从我的刮刀运行这个类似的列表理解时(我会在这里粘贴一些代码,因为我合法地不能发布整个事情):
for page in eins:
rows =[]
driver.get(page)
print("Getting {}".format(page))
soup = BeautifulSoup(driver.page_source, "lxml")
name = soup.find("h1", {"class" : "centered"})
print(name.text)
members = soup.findAll("g", { "transform" : "translate(0,0)"})
time = soup.find("option", {"selected" : "selected"}).text
time = int(time)
for year in members[2:]:
column = year.find_all("g")
for thing in column:
row_info = [name.text, time]
entries = thing.find_all("text")
if len(entries) != 5:
row_info.extend((5 - len(entries)) * [""])
for entry in entries:
row_info.append(entry.text)
rows.append(row_info)
time = time - 1
rows1 = [x for x in rows if x[6][0] != "$"]
现在我突然收到以下错误代码
Traceback (most recent call last):
File "Board_members.py", line 53, in <module>
rows1 = [x for x in rows if x[6][0] != "$"]
File "Board_members.py", line 53, in <listcomp>
rows1 = [x for x in rows if x[6][0] != "$"]
IndexError: string index out of range
是两个实例中没有以相同方式格式化的行列表吗?我在这做错了什么。我尝试了一个先前使用continue函数的for循环和简单的if语句但是所有内容都归结为同样的错误。
我还是初学者,所以请原谅我脆弱的代码。我环顾四周寻找问题的答案,但如果他们在那里,我就无法理解他们。非常感谢你!
编辑:仅针对上下文,第一个实例中的行来自我使用刮刀创建的csv文件,它在csv中看起来像这样。
organization,year,compensation,other,related,name,position
The Conservation Fund,2015,,,,Total Revenue: ,"$215,096,466"
The Conservation Fund,2015,,,,Contributions: ,"$114,351,967"
The Conservation Fund,2015,,,,Gov't Grants: ,"$9,723,802"
The Conservation Fund,2015,,,,Program Services: ,"$90,762,036"
The Conservation Fund,2015,,,,Investments: ,"$220,002"
The Conservation Fund,2015,,,,Special Events: ,$0
The Conservation Fund,2015,,,,Sales: ,$0
The Conservation Fund,2015,,,,Other: ,"$38,659"
The Conservation Fund,2014,,,,Total Expenses: ,"$179,485,477"
The Conservation Fund,2014,,,,Program Services: ,"$174,530,077"
The Conservation Fund,2014,,,,Administration: ,"$2,810,944"
The Conservation Fund,2014,,,,Fundraising: ,"$2,144,456"
The Conservation Fund,2013,"$480,674","$55,266",$0,LAWRENCE A SELZER,PRESIDENT & CEO
The Conservation Fund,2013,"$369,848","$54,856",$0,RICHARD L ERDMANN,EXECUTIVE VICE PRESIDENT
The Conservation Fund,2013,"$312,232","$44,386",$0,DAVID K PHILLIPS JR,EXECUTIVE VP AND CFO
编辑2:这是我从rows1:
之前的打印行获得的输出[['The Conservation Fund', 2015, '', '', '', 'Total Revenue: ', '$215,096,466'], ['The Conservation Fund', 2015, '', '', '', 'Contributions: ', '$114,351,967'], ['The Conservation Fund', 2015, '', '', '', "Gov't Grants: ", '$9,723,802'], ['The Conservation Fund', 2015, '', '', '', 'Program Services: ', '$90,762,036'], ['The Conservation Fund', 2015, '', '', '', 'Investments: ', '$220,002'], ['The Conservation Fund', 2015, '', '', '', 'Special Events: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Sales: ', '$0'], ['The Conservation Fund', 2015, '', '', '', 'Other: ', '$38,659'], ['The Conservation Fund', 2014, '', '', '', 'Total Expenses: ', '$179,485,477'], ['The Conservation Fund', 2014, '', '', '', 'Program Services: ', '$174,530,077']]
答案 0 :(得分:0)
您收到的错误是
IndexError:字符串索引超出范围
这意味着您正在尝试获取不存在的字符串索引。
请参阅以下示例,了解可能导致public class YandexTranslation {
public static String ERROR_MESSAGE = "Can't translate!";
private String mKey;
public YandexTranslation setKey(String key) {
mKey = key;
return this;
}
public Translation getTranslation(String sourceText, String sourceLang, String destinationLang) throws Throwable {
String yandexUrl = "https://translate.yandex.net/api/v1.5/tr.json/translate?key=" + mKey
+ "&text=" + sourceText + "&lang=" + sourceLang + "-" + destinationLang;
URL url = new URL(yandexUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
InputStream inputStream = conn.getInputStream();
String result = convertInputStreamToString(inputStream);
inputStream.close();
conn.disconnect();
return Translation.fromJson(result);
}
public Observable<Translation> getTranslationObservable(final String sourceText, final String sourceLang, final String destinationLang) {
return Observable.create(new Observable.OnSubscribe<Translation>() {
@Override
public void call(Subscriber<? super Translation> subscriber) {
try {
if (!subscriber.isUnsubscribed()) {
Translation translation = getTranslation(sourceText, sourceLang, destinationLang);
subscriber.onNext(translation);
subscriber.onCompleted();
}
} catch (Throwable e) {
subscriber.onError(e);
}
}
});
}
public Observable<String> getTextObservable(final String sourceText, final String sourceLang, final String destinationLang) {
return getTranslationObservable(sourceText, sourceLang, destinationLang)
.map(new Func1<Translation, String>() {
@Override
public String call(Translation translation) {
if (Translation.hasTranslation(translation)){
return translation.translations.get(0);
}
return ERROR_MESSAGE;
}
});
}
public static class Translation {
public int code;
public String lang;
public List<String> translations;
public static Translation fromJson(String json) throws JSONException {
Translation translation = new Translation();
JSONObject jsonObj = new JSONObject(json);
translation.code = jsonObj.getInt("code");
translation.lang = jsonObj.getString("lang");
JSONArray text = jsonObj.getJSONArray("text");
translation.translations = fromJsonArray(text);
return translation;
}
public static boolean hasTranslation(Translation t){
return t != null && t.translations != null && t.translations.size() != 0;
}
}
private static String convertInputStreamToString(InputStream inputStream) throws IOException {
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = "";
String result = "";
while ((line = bufferedReader.readLine()) != null)
result += line;
inputStream.close();
return result;
}
private static List<String> fromJsonArray(JSONArray jsonArray) throws JSONException {
List<String> list = new ArrayList<>();
if (jsonArray != null && jsonArray.length() != 0) {
for (int i = 0; i < jsonArray.length(); i++) {
String text = jsonArray.getString(i);
list.add(text);
}
}
return list;
}
}
IndexError: string index out of range
在你的案件中test = 'abc'
test[2] # Output : c
test[3] # Output : IndexError: string index out of range
test1 = ''
test1[0] # Output : IndexError: string index out of range
test1[1] # Output : IndexError: string index out of range
; rows1 = [x for x in rows if x[6][0] != "$"]
没有值或空字符串;在语句x[6]
中 - 您尝试获取空字符串的0索引。
使用下面可能修复错误的代码,因为下面的代码会首先检查x[6][0]
的空值,然后检查x
x[6]