Question

我正在尝试从API中提取数据，如果成功，则将结果串联到一个大数据帧中。这是代码示例

df = pd.DataFrame()
year = 2000
while year < 2018:
    sqft = 1000
    while sqft < 1500:
        #will include buildHttp code if helpful to this problem
        http = buildHttp(sqft,year)
        try:
            tempDf = pd.read_csv(http)
        except:
            print("No properties matching year or sqft")
            sqft = sqft + 11
        else:
            pd.concat([df, pd.read_csv(http)], ignore_index = True)
            sqft = sqft + 11
    year = year + 1

buildHttp是一个构建字符串的函数，我可以将其传递给API来尝试提取数据。我们不能保证某个物业已经以给定的平方英尺或在给定的年份出售，如果是的话，将抛出EmptyDataFrame错误。我有一些year和sqft的测试用例没有引发错误，可以确认buildHttp确实构建了适当的http，从而pd.read_csv(http)成功提取了数据。完成后，只有成功提取的数据帧不会出现在df中。我要正确组合这些数据帧吗？

Answer 1

两件事。

一个，您没有将串联结果分配给变量。您要

df = pd.concat([df, pd.read_csv(http)], ignore_index = True)

第二，构造数据帧并进行串联是昂贵的。您只需构建一次框架，然后最后进行一次串联，就可以加快代码的速度。

frames = list()
year = 2000
while year < 2018:
    sqft = 1000
    while sqft < 1500:
        #will include buildHttp code if helpful to this problem
        http = buildHttp(sqft,year)
        try:
            df = pd.read_csv(http)
        except:
            print("No properties matching year or sqft")
        else:
            frames.append(df)
        finally:
            sqft = sqft + 11
   year = year + 1
df = pd.concat(frames, ignore_index=True)

在try / except块中串联数据帧

1 个答案: