使用json文件信息迭代csv文件

时间:2018-08-06 02:09:33

标签: python

我有一个json和csv文件的集合,它们都具有相同的名称-只是扩展名不同。我正在尝试:

i)重复从json文件中提取信息,以填充适当的csv文件。

ii)将新创建的数据框另存为新的.csv文件

我下面的代码仅循环一个文件。如何让它遍历当前工作目录中的所有.csv和.json文件?

import pandas as pd
import glob2
import json as jamison


#glob returns a list of files with csv & .json file extensions 
csv_filenames = glob2.glob("*.csv")
csv_files = len(csv_filenames)

json_filenames = glob2.glob("*.json")
json_files = len(json_filenames)


#Json function to determine the variables within the file:
def json_variables(json_full_file, csv_file_iteration):

    json_data=open(json_full_file).read()

    data = jamison.loads(json_data) # json.load() is for loading a file. json.loads() works with strings.

    #define the variables from the json file
    username = data["target_username"]
    analysis_start_date = data["options"]["start"]
    analysis_end_date = data["options"]["end"]

    #Open the csv in pandas - then write the new columns (username, start, end)
    csv_df=pd.read_csv(csv_file_iteration, index_col=None, encoding='utf-8') #filename (sample.csv) defined in the function

    #Add columns 'username, analysis start date, analysis end date, analysis days' to csv (referencing json file)
    csv_df['username'] = username #defined in the function bracket
    csv_df['analysis_start_date'] = analysis_start_date #defined in the function bracket
    csv_df['analysis_end_date'] = analysis_end_date #defined in the function bracket

    #Export final dataframe to individual .csv files:
    csv_df.to_csv(username + '.csv', index=False, header=True)

    print("Complete: %s\n" % (csv))    



#Compare the json filename with the csv filename

for json_full, csv in zip(json_filenames, csv_filenames):
    json_variables(json_full, csv)
        #print("Complete: %s\n" % (csv))

        #Change the csv filename      

print("Iteration complete.")   

2 个答案:

答案 0 :(得分:0)

如果不查看要从中提取文件名的目录就很难回答,一个可能的原因可能是csv文件的数量小于或等于json文件的数量。如果是这种情况,您的zip功能将无法正常工作。列表的长度应相等,否则较长列表中的元素将被截断。可能是这样吗?

a = [1,2,3]
b = [1,2]
for i in zip(a,b):
    print(i)
#(1, 1)
#(2, 2)

答案 1 :(得分:0)

我的错误,我有一个变量名'username',但是在一个实例中将其称为'user_name'。