我有一个json和csv文件的集合,它们都具有相同的名称-只是扩展名不同。我正在尝试:
i)重复从json文件中提取信息,以填充适当的csv文件。
ii)将新创建的数据框另存为新的.csv文件
我下面的代码仅循环一个文件。如何让它遍历当前工作目录中的所有.csv和.json文件?
import pandas as pd
import glob2
import json as jamison
#glob returns a list of files with csv & .json file extensions
csv_filenames = glob2.glob("*.csv")
csv_files = len(csv_filenames)
json_filenames = glob2.glob("*.json")
json_files = len(json_filenames)
#Json function to determine the variables within the file:
def json_variables(json_full_file, csv_file_iteration):
json_data=open(json_full_file).read()
data = jamison.loads(json_data) # json.load() is for loading a file. json.loads() works with strings.
#define the variables from the json file
username = data["target_username"]
analysis_start_date = data["options"]["start"]
analysis_end_date = data["options"]["end"]
#Open the csv in pandas - then write the new columns (username, start, end)
csv_df=pd.read_csv(csv_file_iteration, index_col=None, encoding='utf-8') #filename (sample.csv) defined in the function
#Add columns 'username, analysis start date, analysis end date, analysis days' to csv (referencing json file)
csv_df['username'] = username #defined in the function bracket
csv_df['analysis_start_date'] = analysis_start_date #defined in the function bracket
csv_df['analysis_end_date'] = analysis_end_date #defined in the function bracket
#Export final dataframe to individual .csv files:
csv_df.to_csv(username + '.csv', index=False, header=True)
print("Complete: %s\n" % (csv))
#Compare the json filename with the csv filename
for json_full, csv in zip(json_filenames, csv_filenames):
json_variables(json_full, csv)
#print("Complete: %s\n" % (csv))
#Change the csv filename
print("Iteration complete.")
答案 0 :(得分:0)
如果不查看要从中提取文件名的目录就很难回答,一个可能的原因可能是csv文件的数量小于或等于json文件的数量。如果是这种情况,您的zip功能将无法正常工作。列表的长度应相等,否则较长列表中的元素将被截断。可能是这样吗?
a = [1,2,3]
b = [1,2]
for i in zip(a,b):
print(i)
#(1, 1)
#(2, 2)
答案 1 :(得分:0)
我的错误,我有一个变量名'username',但是在一个实例中将其称为'user_name'。