我是python的新手。尝试使用此link
为我的案例构建代码我有200k json文件,需要把它放在数据帧中。
为了达到这个目的,我为我的问题制作了样本案例。
我在子目录中有一个带有json文件的'test'文件夹。目录如下所示:
test>test1>test1> 3 json files
test>test2>test2> 3 json files
test>test3>test3> 3 json files
我正在尝试将所有9个json文件放入数据帧中。我的代码如下:
import json
import os
import pandas as pd
import json
jpath='C:\\Users\\Sharath\\Desktop\\test'
result=[]
for i in os.listdir(jpath):
k=os.path.join(jpath,i)
for j in os.listdir(k):
l=os.path.join(k,j)
result.append(l)
print(result)
['C:\\Users\\Sharath\\Desktop\\test\\test1\\test1', 'C:\\Users\\Sharath\\Desktop\\test\\test2\\test2', 'C:\\Users\\Sharath\\Desktop\\test\\test3\\test3']
jsons_data = pd.DataFrame(columns=['homepage_url', 'number_of_employees', 'email_address'])
for i in range(len(result)):
for j in os.listdir(result[i]):
with open(os.path.join(result[i],j)) as jfile:
jtext=json.load(jfile)
homepage_url = jtext['homepage_url']
number_of_employees = jtext['number_of_employees']
email_address = jtext['email_address']
jsons_data.loc[index]=[homepage_url,number_of_employees,email_address]
print(jsons_data)
homepage_url number_of_employees email_address
2 http://www.01tek.com 1.0 khouidi.you@gmail.com
homepage_url number_of_employees email_address
2 http://www.123listo.com NaN info@123listo.com
homepage_url number_of_employees email_address
2 http://www.immortaloutdoors.com NaN
homepage_url number_of_employees \
2 http://www.1on1fitnesstraining.com 50.0
email_address
2 1on1fitnesstraining013@gmail.com
homepage_url number_of_employees email_address
2 http://1onlybat.bigcartel.com NaN office@1onlybat.com
homepage_url number_of_employees email_address
2 http://www.1doc3.com 5.0 contacto@1doc3.com
homepage_url number_of_employees email_address
2 http://1phoneapp.com 10.0
homepage_url number_of_employees email_address
2 None NaN
homepage_url number_of_employees \
2 http://www.1stalliancelending.com 51.0
email_address
2 info@placewelovemost.com
当我试图通过调用jsons_data来查看数据帧时,结果如下:
我无法理解为什么我只得到索引2的一个结果。 请帮助我如何使用此方法获取数据框中的所有9个文件。
答案 0 :(得分:0)
有几种方法可以做到这一点。一种是使用pd.read_json()然后连接数据帧。假设你想在你提到的链接中使用这种方法,你需要在循环中更新变量索引,这样你就可以改变代码,使你的主循环成为:
for i in range(len(result)):
for index, j in enumerate(os.listdir(result[i])):
with open(os.path.join(result[i],j)) as jfile:
jtext=json.load(jfile)
homepage_url = jtext['homepage_url']
number_of_employees = jtext['number_of_employees']
email_address = jtext['email_address']
jsons_data.loc[index]=[homepage_url,number_of_employees,email_address]
print(jsons_data)
注意该行:
for j in os.listdir(result[i]):
改为
for index, j in enumerate(os.listdir(result[i])):