我偶然发现了一个非常简单的情况,我似乎无法找到解决方案。
我想做的很简单:将一些数据写入包含以下内容的 .csv 文件中。
我现在这样做的方式似乎是我能提出的唯一解决方案:
keys()
并将其添加到set()
(这将是标题)writer.writerows(data)
基本上,简单的 MCVE 可能如下所示:
from csv import DictWriter
RESULT_FILE = 'test_result.csv'
def get_fieldnames(data):
fieldnames = set()
for item in data:
fieldnames.update(item.keys())
return fieldnames
def main(data):
fieldnames = get_fieldnames(data)
with open(RESULT_FILE, 'a', newline='', encoding='utf-8') as f:
writer = DictWriter(f, fieldnames=fieldnames, delimiter=',')
writer.writeheader()
writer.writerows(data)
if __name__ == '__main__':
data_ = [
{
'a': '1',
'b': '2',
'c': '3',
},
{
'a': '6',
'd': '1',
'b': '3',
},
{
'c': '2',
'e': '1',
'f': '9',
}
]
main(data_)
现在,我不喜欢这个:
如果标题是动态的,我怎样才能避免在csv中一次性导出所有数据?
根据要求,真实数据如下所示:
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 24.70',
'Info': '',
'Line art': '',
'Name': '(5") Non-Vacuum Disc Pad Vinyl-Face',
'Product number': '91456106T',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsGr/120/107/6/1201076/1419675_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '',
'Each': '$ 8.19',
'Info': '<p><strong>material: </strong>Cork</p>',
'Line art': '',
'Name': 'Replacement Plate for MKT9924DB Belt Sander',
'Product number': 'MKT4230358',
'Technical specifications': '<p><strong>brand: </strong>Makita</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/116/631/4/1166314/1281513_700.jpg',
'\xa0': '$ 257.80'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '',
'Each': '$ 8.19',
'Info': '<p><strong>material: </strong>Graphite</p>',
'Line art': '',
'Name': 'Replacement Plate for MKT9924DB Belt Sander',
'Product number': 'MKT4230366',
'Technical specifications': '<p><strong>brand: </strong>Makita</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/MK/T4/23/03/66/MKT4230366/1281514_700.jpg',
'\xa0': '$ 257.80'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 38.47',
'Info': '',
'Line art': '',
'Name': 'Non-Grip Vacuum Pads',
'Product number': '9154325',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: '
'</strong>Medium</p><p><strong>nap: '
'</strong>Short</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/91/54/32/5/9154325/1213330_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/54/32/5/9154325/1213331_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 52.92',
'Info': '',
'Line art': '',
'Name': 'Non-Grip Vacuum Pads',
'Product number': '9154327',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: '
'</strong>Medium</p><p><strong>nap: '
'</strong>Short</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/105/122/1/1051221/1213328_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/54/32/7/9154327/1213332_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Unique one-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 26.84',
'Info': '',
'Line art': '',
'Name': 'Stick-on Non-Vacuum Pads',
'Product number': '9156106',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/105/122/4/1051224/1213343_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/56/10/6/9156106/1213345_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Unique one-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 51.70',
'Info': '',
'Line art': '',
'Name': 'Stick-on Non-Vacuum Pads',
'Product number': '9156107',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/91/56/10/7/9156107/1213344_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/56/10/7/9156107/1213346_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K060',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/06/0/PC371K060/1263523_700.jpg',
'\xa0': '$ 148.18'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K080',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/08/0/PC371K080/1263524_700.jpg',
'\xa0': '$ 148.18'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K120',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/12/0/PC371K120/1263526_700.jpg',
'\xa0': '$ 148.18'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K100',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/10/0/PC371K100/1263525_700.jpg',
'\xa0': '$ 148.18'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 25.22',
'Info': '',
'Line art': '',
'Name': '5" Non-Vacuum Disc Pad Hook-Face',
'Product number': '91454325T',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsGr/120/107/7/1201077/1419678_700.jpg'}
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Pads mount with screws.',
'Each': '$ 31.80',
'Info': '',
'Line art': '',
'Name': 'Plates for Non-Vacuum (Grip-On) Dynabug II Disc Pads - 7.62 cm x '
'10.79 cm (3" x 4-1/4")',
'Product number': '9156315',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/116/625/4/1166254/1280825_700.jpg',
'\xa0': '$ 179.95'}
答案 0 :(得分:4)
Edit-1 26-Dec:更新了根据您的数据生成数据的代码
根据您的要求,我建议如下
下面是一个快速/肮脏的POC,它适用于我
import csv
try:
f = open("headers.csv", mode="r+", encoding="utf-8")
except FileNotFoundError:
f = open("headers.csv", mode="w+", encoding="utf-8")
f2 = open("data.csv", mode="a+", encoding="utf-8")
f.seek(0)
headers = f.readline().strip().split(",")
if headers == ['']:
headers = []
headers_map = {}
for index, field in enumerate(headers):
headers_map[field] = index
def update_header_dict(data):
updated_headers = False
for key in data.keys():
if key not in headers_map:
new_index = len(headers_map)
headers_map[key] = new_index
updated_headers = True
if updated_headers:
f.seek(0)
csv.DictWriter(f, headers_map.keys()).writeheader()
f.flush()
def get_row_data_dict(data):
row_data = [""] * len(headers_map)
for k, v in data.items():
# if v and v[0] in ('=', '-'):
# # Mark the value as text, only needed if you want to display data in excel
# # else should be commented out
# v = "'" + v
row_data[headers_map[k]] = v
return row_data
def main(data):
data_writer = csv.writer(f2)
for row in data:
update_header_dict(row)
data_writer.writerow(get_row_data_dict(row))
data_ = [
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 24.70',
'Info': '',
'Line art': '',
'Name': '(5") Non-Vacuum Disc Pad Vinyl-Face',
'Product number': '91456106T',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsGr/120/107/6/1201076/1419675_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '',
'Each': '$ 8.19',
'Info': '<p><strong>material: </strong>Cork</p>',
'Line art': '',
'Name': 'Replacement Plate for MKT9924DB Belt Sander',
'Product number': 'MKT4230358',
'Technical specifications': '<p><strong>brand: </strong>Makita</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/116/631/4/1166314/1281513_700.jpg',
'\xa0': '$ 257.80'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '',
'Each': '$ 8.19',
'Info': '<p><strong>material: </strong>Graphite</p>',
'Line art': '',
'Name': 'Replacement Plate for MKT9924DB Belt Sander',
'Product number': 'MKT4230366',
'Technical specifications': '<p><strong>brand: </strong>Makita</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/MK/T4/23/03/66/MKT4230366/1281514_700.jpg',
'\xa0': '$ 257.80'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 38.47',
'Info': '',
'Line art': '',
'Name': 'Non-Grip Vacuum Pads',
'Product number': '9154325',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: '
'</strong>Medium</p><p><strong>nap: '
'</strong>Short</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/91/54/32/5/9154325/1213330_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/54/32/5/9154325/1213331_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 52.92',
'Info': '',
'Line art': '',
'Name': 'Non-Grip Vacuum Pads',
'Product number': '9154327',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: '
'</strong>Medium</p><p><strong>nap: '
'</strong>Short</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/105/122/1/1051221/1213328_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/54/32/7/9154327/1213332_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Unique one-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 26.84',
'Info': '',
'Line art': '',
'Name': 'Stick-on Non-Vacuum Pads',
'Product number': '9156106',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/105/122/4/1051224/1213343_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/56/10/6/9156106/1213345_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Unique one-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 51.70',
'Info': '',
'Line art': '',
'Name': 'Stick-on Non-Vacuum Pads',
'Product number': '9156107',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsPr/91/56/10/7/9156107/1213344_700.jpg',
'image_2': 'https://www.richelieu.com/documents/docsPr/91/56/10/7/9156107/1213346_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K060',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/06/0/PC371K060/1263523_700.jpg',
'\xa0': '$ 148.18'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K080',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/08/0/PC371K080/1263524_700.jpg',
'\xa0': '$ 148.18'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K120',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/12/0/PC371K120/1263526_700.jpg',
'\xa0': '$ 148.18'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Size: 2-1/2" x 14".',
'Each': '$ 12.36',
'Info': '',
'Line art': '',
'Name': 'Sandpaper Belt 2½ " x 14" for Compact Belt Sander PC371 or PC371K',
'Product number': 'PC371K100',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsPr/PC/37/1K/10/0/PC371K100/1263525_700.jpg',
'\xa0': '$ 148.18'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': 'Exclusive single-piece hub design reduces pad vibration and '
'ensures smooth performance.',
'Each': '$ 25.22',
'Info': '',
'Line art': '',
'Name': '5" Non-Vacuum Disc Pad Hook-Face',
'Product number': '91454325T',
'Technical specifications': '',
'image_1': 'https://www.richelieu.com/documents/docsGr/120/107/7/1201077/1419678_700.jpg'},
{'Catalog link': '',
'Category': 'Tools and Shop Supplies / Workshop Accessories / Tool '
'Accessories / Sander Accessories',
'Description': '- Pads mount with screws.',
'Each': '$ 31.80',
'Info': '',
'Line art': '',
'Name': 'Plates for Non-Vacuum (Grip-On) Dynabug II Disc Pads - 7.62 cm x '
'10.79 cm (3" x 4-1/4")',
'Product number': '9156315',
'Technical specifications': '<p><strong>thickness: </strong>3/8 '
'in</p><p><strong>density: </strong>Medium</p>',
'image_1': 'https://www.richelieu.com/documents/docsGr/116/625/4/1166254/1280825_700.jpg',
'\xa0': '$ 179.95'}
]
data2_ = [
{
'a': '2',
'f': '1',
'z': '9',
},
]
main(data_)
# main(data2_)
f.close()
f2.close()
在上面运行会生成两个文件,然后我在终端上运行
cat headers.csv data.csv > output.csv
然后在Excel中打开output.csv
您可能看到的唯一问题是#NAME?
,但这些是因为Excel正在尝试处理您在文本开头时所拥有的-
。如果你要处理这样的文本,你需要取消注释代码的下面部分
# if v and v[0] in ('=', '-'):
# # Mark the value as text, only needed if you want to display data in excel
# # else should be commented out
# v = "'" + v
答案 1 :(得分:2)
由于您的数据来自抓取,因此可能会将其视为流。
为了模仿流,我使用data_.pop()
一次获取一个项目。
以下解决方案添加了来自流的每个项目。
csv的标题和正文存储在不同的文件中。
标题随着时间的推移可能会长度增加。
在这样的增长步骤之前保存的行自然无法知道
这种增长因此可能会遗漏一些尾随的逗号来表示缺少的项目。
import csv
import os
class StreamCSV: # Python 3
def __init__(self, header_file_name, body_file_name):
self.header_file_name = header_file_name
self.fbody = open(body_file_name, 'a', newline='', encoding='utf-8')
self.csv_body = csv.writer(self.fbody)
def add_item(self, item):
if os.path.exists(self.header_file_name):
with open(self.header_file_name, 'r', newline='', encoding='utf-8') as fobj:
reader = csv.reader(fobj)
try:
current_header = next(reader)
except StopIteration:
current_header = []
else:
current_header = []
header_set = set(current_header)
for key in item:
if key not in header_set:
current_header.append(key)
if len(header_set) < len(current_header):
with open(self.header_file_name, 'w', newline='', encoding='utf-8') as fobj:
writer = csv.writer(fobj)
writer.writerow(current_header)
item_data = [item.get(head, '') for head in current_header]
self.csv_body.writerow(item_data)
self.fbody.flush() # allows peeing into the file
if __name__ == '__main__':
data_ = [
{
'a': '1',
'b': '2',
'c': '3',
},
{
'a': '6',
'd': '1',
'b': '3',
},
{
'c': '2',
'e': '1',
'f': '9',
}
]
def show_saved(file_names):
for name in file_names:
with open(name) as fobj:
print(name)
print(fobj.read())
header_file_name, body_file_name = 'header.csv', 'body.csv'
stream_writer = StreamCSV(header_file_name, body_file_name)
for x in range(1, 4):
print('step:', x)
stream_writer.add_item(data_.pop())
show_saved([header_file_name, body_file_name])
显示随时间增长的输出:
step: 1
header.csv
c,e,f
body.csv
2,1,9
step: 2
header.csv
c,e,f,a,d,b
body.csv
2,1,9
,,,6,1,3
step: 3
header.csv
c,e,f,a,d,b
body.csv
2,1,9
,,,6,1,3
3,,,1,,2
您可能希望在附加步骤中合并标题和正文,添加此类缺失的尾随逗号。
def merge_header_body(header_file_name, body_file_name, out_file_name):
with open(header_file_name, 'r', newline='', encoding='utf-8') as fobj:
reader = csv.reader(fobj)
header = next(reader)
with open(out_file_name, 'w', newline='', encoding='utf-8') as fobj_out, \
open(body_file_name, 'r', newline='', encoding='utf-8') as fobj_in:
reader = csv.reader(fobj_in)
writer = csv.writer(fobj_out)
writer.writerow(header)
target_length = len(header)
for row in reader:
diff = target_length - len(row)
row.extend([''] * diff)
writer.writerow(row)
out_file_name = 'merged.csv'
merge_header_body(header_file_name, body_file_name, out_file_name)
merged.csv
的内容:
c,e,f,a,d,b
2,1,9,,,
,,,6,1,3
3,,,1,,2
如果程序在两者之间崩溃,它将恢复。 让我们采用与以前相同的数据并添加更多行:
for x in range(1, 4):
print('step:', x)
stream_writer.add_item(data_.pop())
show_saved([header_file_name, body_file_name])
输出:
step: 1
header.csv
c,e,f,a,d,b
body.csv
2,1,9
,,,6,1,3
3,,,1,,2
2,1,9,,,
step: 2
header.csv
c,e,f,a,d,b
body.csv
2,1,9
,,,6,1,3
3,,,1,,2
2,1,9,,,
,,,6,1,3
step: 3
header.csv
c,e,f,a,d,b
body.csv
2,1,9
,,,6,1,3
3,,,1,,2
2,1,9,,,
,,,6,1,3
3,,,1,,2
答案 2 :(得分:2)
As it was said,我会使用&#34;写到文件&#34;每排有冲洗机构系统。如果您不介意使用pandas,最简单的方法是改变您的main
功能,如下所示:
def main(data):
df = pd.DataFrame()
for item in data:
df_current = pd.DataFrame.from_dict(item, orient='index').T
df = df.append(df_current)
df.to_csv(RESULT_FILE, index=False)
这样您就可以使用新更新的RESULT_FILE
更新DataFrame
,而无需知道完整标题。
为了进一步提高性能,您可以添加一个条件来为每个n
数据集写入文件:
def main(data):
df = pd.DataFrame()
chunksize = 5
for i, item in enumerate(data):
df_c = pd.DataFrame.from_dict(item, orient='index').T
df = df.append(df_c)
if ((i%chunksize)==0 or i==(len(data)-1)):
df.to_csv(RESULT_FILE, index=False)
至于memory issues,我建议您使用iterators over lists作为传递给此函数的初始报废data
,以减少内存消耗。
答案 3 :(得分:1)
也许它有点超过顶部,但是对我来说是解决问题的最简单方法。它利用sqlite并能够随时向表中添加列。另外,我没有详尽地测试它。
#!/bin/env python
from os import path
import sqlite3
import atexit
how_many = 0
class DB(object):
db_file = "data.db"
def __init__(self):
self._fieldnames = set(["ignore_field"])
self._cursor = None
self._db_conn = None
create = False
if not path.isfile(self.db_file):
create = True
self._db_conn = sqlite3.connect(self.db_file)
self._cursor = self._db_conn.cursor()
if create:
self._cursor.execute("""CREATE TABLE data (ignore_field integer)""")
else:
# retrieve already existing fieldnames so we can continue
pragma = self._db_conn.execute("pragma table_info('data')").fetchall()
self._fieldnames = set([x[1] for x in pragma])
def _add_fields(self, field_list):
for field in field_list:
if field not in self._fieldnames:
self._cursor.execute("alter table data add column '%s' 'TEXT'" % field)
self._fieldnames.add(field)
def _insert_data(self, data):
fields = []
values = []
for f, v in data.iteritems():
fields.append(f)
values.append("'{}'".format(v))
sql = """insert into data ({}) values ({})""".format(", ".join(fields), ", ".join(values))
self._db_conn.execute(sql)
def consume(self, one_dict):
self._add_fields(one_dict.keys())
self._insert_data(one_dict)
self._db_conn.commit()
def csv_out(self):
self._cursor.execute("select * from data")
header = [x[0] for x in self._cursor.description]
print(",".join(header))
for row in self._cursor:
out = []
for field in row:
out.append(field if field else "")
print(",".join(out))
def cleanup(total):
print("Ended after record {}/{}".format(how_many, total))
def main(data):
global how_many
atexit.register(cleanup, len(data))
db = DB()
skip = False
if how_many:
skip = how_many
for each in data:
if not skip:
db.consume(each)
else:
skip -= 1
if not skip:
print("Finished skipping {} records.".format(how_many))
how_many += 1
print("Completed loading available data.")
db.csv_out()
if __name__ == "__main__":
data_ = [
{
'a': '1',
'b': '2',
'c': '3',
},
{
'a': '6',
'd': '1',
'b': '3',
},
{
'c': '2',
'e': '1',
'f': '9',
}
]
main(data_)
如果你修改how_many,那么主循环会跳过那么多记录。这可以让你从崩溃中恢复,因为atexit钩子应该告诉你程序有多远。
还有一个虚假的列/字段名称,因为你无法创建一个空表,我变得很懒,并且没有将表创建绑定到DB.consume()的第一次迭代中。您始终可以将“ignore_field”替换为现有字段之一。
更懒惰,我没有做文件IO我只打印出CSV。