我有一棵树,结构如下:
my_hash_pop = {
"Europe" : {
"France" : {
"Paris" : 2220445,
"Lille" : 225789,
"Lyon" : 506615 },
"Germany" : {
"Berlin" : 3520031,
"Munchen" : 1544041,
"Dresden" : 540000 },
},
"South America" : {
"Brasil" : {
"Sao Paulo" : 11895893,
"Rio de Janeiro" : 6093472 },
"Argentina" : {
"Salta" : 535303,
"Buenos Aires" : 3090900 },
},
}
我想使用python:
将此结构转换为CSVEurope;Germany;Berlin;3520031
Europe;Germany;Munchen;1544041
Europe;Germany;Dresden;540000
Europe;France;Paris;2220445
Europe;France;Lyon;506615
Europe;France;Lille;225789
South America;Argentina;Buenos Aires;3090900
South America;Argentina;Salta;3090900
South America;Brasil;Sao Paulo;11895893
South America;Brasil;Rio de Janeiro;6093472
由于我的树在现实生活中包含大量的叶子(显然不是在这个例子中),我使用的转换脚本需要很长时间。我试图找到一种更有效的方式来进行转换。这是我试过的:
### METHOD 1 ###
start_1 = time.time()
data_to_write = ""
for region in my_hash_pop:
for country in my_hash_pop[region]:
for city in my_hash_pop[region][country]:
data_to_write += region+";"+country+";"+city+";"+str(my_hash_pop[region][country][city])+"\n"
filename = "my_test_1.csv"
with open("my_test_1.csv", 'w+') as outfile:
outfile.write(data_to_write)
outfile.close()
end_1 = time.time()
print("---> METHOD 1 : Write all took " + str(end_1 - start_1) + "s")
### METHOD 2 ###
start_2 = time.time()
data_to_write = ""
for region in my_hash_pop:
region_to_write = ""
for country in my_hash_pop[region]:
country_to_write = ""
for city in my_hash_pop[region][country]:
city_to_write = region+";"+country+";"+city+";"+str(my_hash_pop[region][country][city])+"\n"
country_to_write += city_to_write
region_to_write += country_to_write
data_to_write += region_to_write
filename = "my_test_2.csv"
with open("my_test_2.csv", 'w+') as outfile:
outfile.write(data_to_write)
outfile.close()
end_2 = time.time()
print("---> METHOD 2 : Write all took " + str(end_2 - start_2) + "s")
### METHOD 3 ###
import csv
start_3 = time.time()
with open("my_test_3.csv", 'w+') as outfile:
del_char = b";"
w = csv.writer(outfile, delimiter=del_char)
for region in my_hash_pop:
for country in my_hash_pop[region]:
for city in my_hash_pop[region][country]:
w.writerow([region, country, city, str(my_hash_pop[region][country][city])])
end_3 = time.time()
print("---> METHOD 3 : Write all took " + str(end_3 - start_3) + "s")
比较每个方法在成长我的示例树时所花费的时间,我注意到方法1相当不合适。方法2和方法之间但是,结果各不相同,并且不那么明显(通常方法3似乎更有效)
因此我有两个问题:
奖励一个:
感谢您的贡献!
答案 0 :(得分:1)
第三种方法是最有希望的。
您可以在每个级别使用items()
来避免许多字典查找:
with open("my_test_3.csv", 'w+') as outfile:
del_char = ";"
w = csv.writer(outfile, delimiter=del_char)
for region,countries in my_hash_pop.items():
for country,cities in countries.items():
for city,value in cities.items():
w.writerow([region, country, city, value])
示例2和3之间的大小差异来自newlines:"\n"
'my_test_2.csv'
和"\r\n"
'my_test_3.csv'
。
因此,'my_test_3.csv'
中的每一行都比'my_test_2.csv'
中的每一行大1个字节。
答案 1 :(得分:1)
start_1 = time.time()
filename = "my_test_4.csv"
with open("my_test_4.csv", 'w+') as outfile:
a = [outfile.write("%s;%s;%s;%s\n" % (k, kk, kkk, vvv))
for (k, v) in my_hash_pop.items()
for (kk, vv) in v.items()
for (kkk, vvv) in vv.items()]
end_1 = time.time()
print("---> METHOD 1 : Write all took " + str(end_1 - start_1) + "s")
答案 2 :(得分:0)
建议使用大熊猫,如下:
import pandas as pd
df = pd.DataFrame([(i,j,k,my_hash_pop[i][j][k])
for i in my_hash_pop.keys()
for j in my_hash_pop[i].keys()
for k in my_hash_pop[i][j].keys()])
with open("my_test_4.csv", 'w') as outfile:
outfile.write(df.to_csv(sep=';', header=False, index=False)))
我没有比较执行时间,也许使用熊猫不是你的选择,所以这只是一个建议。
答案 3 :(得分:0)
panads
非常有效。下面是一个导入大熊猫dict的方法,使用json_normalize
展平,然后你可以操纵它。例如写信给csv等。
让我知道你的选择如何。
源代码
from pandas.io.json import json_normalize
df = json_normalize(my_hash_pop)
outfile = "temp.csv"
del_char = ";"
with open(outfile, 'wb+') as outfile:
w = csv.writer(outfile, delimiter =';',quoting=csv.QUOTE_MINIMAL)
for i in df.keys():
s = ("{};{}").format(i.replace('.',';'),df[i][0]).split(";")
w.writerow(s)