for循环从url读取几个json

时间:2015-07-28 08:02:48

标签: python json for-loop pandas

我必须创建几个包含从网址中获取的几个json文件的数据集。

我设法以我需要的格式导入一个:

url = "https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111"
data = urllib2.urlopen(url).read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc[0]
data = data.drop(data.index[[0]])

因为我有一长串网址,我需要的是一个for循环,它为所有这些代码重复这段代码。我的尝试是:

for i in urls:
data = urllib2.urlopen(str(i)).read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = test.transpose()
data.columns = data.iloc[0]
data = data.drop(data.index[[0]])
df.append(data)

其中url是包含地址的字符串列表,即

"https://cws01.worldstores.co.uk/api/product.php?product_sku=125T:FT0111"

和df是一个空数据框,其列与for循环中每个网址生成的数据框中的列相同

当我运行它时,我不断收到以下错误:

 raise ValueError("No JSON object could be decoded")

 ValueError: No JSON object could be decoded

我运行单个网址的第一段代码时没有出现的错误。 我做错了什么?

编辑:

新的尝试是按如下方式更改for循环:

for i in urls:
     data = urllib2.urlopen(str(i)).read()
     try:
         data = json.loads(data)
     except:
         print(data) 
         print(i)
         exit(-1)
     data = pd.DataFrame(data.items())
     data = data.transpose()
     data.columns = data.iloc[0]
     data = data.drop(data.index[[0]])
     df.append(data)

现在我收到了错误:

   data = pd.DataFrame(data.items())

 AttributeError: 'str' object has no attribute 'items'

2 个答案:

答案 0 :(得分:1)

或者你可以使用pandas native read_json function

import urllib2
import pandas as pd


url_base = "https://cws01.worldstores.co.uk/api/product.php?product_sku={}"
products = ["125T:FT0111", "125T:FT0111", "125T:FT0111"]

raw_data_list = []

for sku in products:
    url = url_base.format(sku)
    try:
        raw_data = urllib2.urlopen(url).read()
        if raw_data != "":
            raw_data_list.append(raw_data)
    except:
        pass

data = "[" + (",".join(raw_data_list)) + "]"
data = pd.read_json(data, orient='records')
data

答案 1 :(得分:0)

因为您在for循环中缺少json.loads()行

url = "https://cws01.worldstores.co.uk/api/product.php?    
product_sku=125T:FT0111"
data = urllib2.urlopen(url).read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc[0]
data = data.drop(data.index[[0]])


for i in urls:
    data = urllib2.urlopen(str(i)).read()
    data = json.loads(data) # <- ADDED
    data = pd.DataFrame(data.items())
    data = test.transpose()
    data.columns = data.iloc[0]
    data = data.drop(data.index[[0]])
    df.append(data)