我正在尝试使用pandas to_csv()
将包含19列和大约150k行的数据框导出到CSV。其中一列包含在某些情况下很长的字符串(大约1000个字符)。我面临极长的出口时间。我从来没有达到目的,但导出前1000行需要近200秒(结果文件只有185千字节!)。
我正在开发相当强大的ec2机器,因此硬件性能应该不是问题。该文件每秒节省几千字节,所以我认为我也没有达到I / O限制。当我尝试分析前1000行的导出时,结果表明pandas._libs.lib.write_csv_rows
几乎占用了整个执行时间(附加了分析结果)。有什么方法可以在Python中快速将这样的框架导出为CSV吗?
>>> profile.print_stats()
709 function calls (707 primitive calls) in 196.859 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:103(release)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:143(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:147(__enter__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:151(__exit__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:176(cb)
20 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:222(_verbose_message)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:369(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:58(__init__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:707(find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:78(acquire)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:780(find_spec)
5 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:843(__enter__)
5 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:847(__exit__)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:861(_find_spec_legacy)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:870(_find_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:966(_find_and_load)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
5 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1080(_path_importer_cache)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1117(_get_spec)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1149(find_spec)
4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1233(find_spec)
4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:41(_relax_case)
20 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:57(_path_join)
20 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:59(<listcomp>)
4 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:75(_path_stat)
1 0.000 0.000 196.859 196.859 <stdin>:1(<module>)
2 0.000 0.000 0.000 0.000 __init__.py:200(iteritems)
1 0.000 0.000 0.000 0.000 _methods.py:37(_any)
4 0.000 0.000 0.000 0.000 apipkg.py:133(__makeattr)
1 0.000 0.000 0.000 0.000 base.py:1108(nlevels)
2 0.000 0.000 0.000 0.000 base.py:1336(is_object)
1 0.000 0.000 0.000 0.000 base.py:1408(_convert_slice_indexer)
2 0.000 0.000 0.001 0.000 base.py:1984(to_native_types)
2 0.000 0.000 0.001 0.000 base.py:2010(_format_native_types)
3 0.000 0.000 0.000 0.000 base.py:3482(_validate_indexer)
2 0.000 0.000 0.000 0.000 base.py:4155(_ensure_index)
2 0.000 0.000 0.000 0.000 base.py:551(_reset_identity)
2 0.000 0.000 0.000 0.000 base.py:557(__len__)
2 0.000 0.000 0.000 0.000 base.py:563(__array__)
2 0.000 0.000 0.000 0.000 base.py:588(values)
1 0.000 0.000 0.000 0.000 codecs.py:185(__init__)
1 0.000 0.000 0.000 0.000 common.py:102(_expand_user)
3 0.000 0.000 0.000 0.000 common.py:1136(is_datetime_or_timedelta_dtype)
2 0.000 0.000 0.000 0.000 common.py:128(_stringify_path)
3 0.000 0.000 0.000 0.000 common.py:1371(needs_i8_conversion)
2 0.000 0.000 0.000 0.000 common.py:1456(is_string_like_dtype)
7 0.000 0.000 0.000 0.000 common.py:1722(_get_dtype)
5 0.000 0.000 0.000 0.000 common.py:1773(_get_dtype_type)
1 0.000 0.000 0.000 0.000 common.py:184(is_bool_indexer)
1 0.000 0.000 0.001 0.001 common.py:291(_get_handle)
3 0.000 0.000 0.000 0.000 common.py:334(is_datetime64tz_dtype)
5 0.000 0.000 0.000 0.000 common.py:409(is_period_dtype)
2 0.000 0.000 0.000 0.000 common.py:442(is_interval_dtype)
1 0.000 0.000 0.000 0.000 common.py:464(_apply_if_callable)
2 0.000 0.000 0.000 0.000 common.py:478(is_categorical_dtype)
1 0.000 0.000 0.000 0.000 common.py:488(UnicodeWriter)
5 0.000 0.000 0.000 0.000 common.py:511(is_string_dtype)
2 0.000 0.000 0.000 0.000 common.py:85(is_object_dtype)
5 0.000 0.000 0.000 0.000 dtypes.py:556(is_dtype)
2 0.000 0.000 0.000 0.000 dtypes.py:678(is_dtype)
12 0.000 0.000 0.000 0.000 dtypes.py:85(is_dtype)
1 0.000 0.000 0.000 0.000 format.py:1526(__init__)
4 0.000 0.000 0.000 0.000 format.py:1604(<genexpr>)
1 0.000 0.000 196.858 196.858 format.py:1621(save)
1 0.000 0.000 0.000 0.000 format.py:1658(_save_header)
1 0.000 0.000 196.858 196.858 format.py:1738(_save)
1 0.000 0.000 196.858 196.858 format.py:1756(_save_chunk)
1 0.000 0.000 196.859 196.859 frame.py:1433(to_csv)
1 0.000 0.000 0.000 0.000 frame.py:303(_constructor)
1 0.000 0.000 0.000 0.000 frame.py:316(__init__)
1 0.000 0.000 0.000 0.000 generic.py:120(__init__)
1 0.000 0.000 0.000 0.000 generic.py:162(_init_mgr)
1 0.000 0.000 0.000 0.000 generic.py:1804(_indexer)
1 0.000 0.000 0.000 0.000 generic.py:1937(_slice)
1 0.000 0.000 0.000 0.000 generic.py:1957(_set_is_copy)
1 0.000 0.000 0.001 0.001 generic.py:3250(head)
1 0.000 0.000 0.000 0.000 generic.py:346(_get_axis_number)
1 0.000 0.000 0.000 0.000 generic.py:3583(__finalize__)
1 0.000 0.000 0.000 0.000 generic.py:359(_get_axis_name)
1 0.000 0.000 0.000 0.000 generic.py:3616(__setattr__)
1 0.000 0.000 0.000 0.000 generic.py:372(_get_axis)
1 0.000 0.000 0.000 0.000 generic.py:376(_get_block_manager_axis)
19 0.000 0.000 0.000 0.000 generic.py:7(_check)
1 0.000 0.000 0.001 0.001 indexing.py:1358(__getitem__)
1 0.000 0.000 0.000 0.000 indexing.py:152(_slice)
1 0.000 0.000 0.000 0.000 indexing.py:1658(_has_valid_type)
1 0.000 0.000 0.001 0.001 indexing.py:1764(_get_slice_axis)
1 0.000 0.000 0.001 0.001 indexing.py:1799(_getitem_axis)
1 0.000 0.000 0.000 0.000 indexing.py:2148(need_slice)
1 0.000 0.000 0.000 0.000 indexing.py:258(_convert_slice_indexer)
3 0.000 0.000 0.000 0.000 internals.py:107(__init__)
1 0.000 0.000 0.001 0.001 internals.py:1846(to_native_types)
12 0.000 0.000 0.000 0.000 internals.py:189(mgr_locs)
1 0.000 0.000 0.000 0.000 internals.py:2076(__init__)
3 0.000 0.000 0.000 0.000 internals.py:218(make_block_same_class)
3 0.000 0.000 0.000 0.000 internals.py:226(mgr_locs)
3 0.000 0.000 0.000 0.000 internals.py:260(_slice)
3 0.000 0.000 0.000 0.000 internals.py:279(getitem_block)
3 0.000 0.000 0.000 0.000 internals.py:2921(make_block)
3 0.000 0.000 0.000 0.000 internals.py:299(shape)
1 0.000 0.000 0.000 0.000 internals.py:3017(__init__)
1 0.000 0.000 0.000 0.000 internals.py:3018(<listcomp>)
2 0.000 0.000 0.000 0.000 internals.py:3058(shape)
6 0.000 0.000 0.000 0.000 internals.py:3060(<genexpr>)
4 0.000 0.000 0.000 0.000 internals.py:3062(ndim)
3 0.000 0.000 0.000 0.000 internals.py:307(dtype)
3 0.000 0.000 0.000 0.000 internals.py:311(ftype)
1 0.000 0.000 0.000 0.000 internals.py:3114(_rebuild_blknos_and_blklocs)
1 0.000 0.000 0.000 0.000 internals.py:3524(is_consolidated)
1 0.000 0.000 0.000 0.000 internals.py:3532(_consolidate_check)
1 0.000 0.000 0.000 0.000 internals.py:3533(<listcomp>)
1 0.000 0.000 0.000 0.000 internals.py:3612(get_slice)
1 0.000 0.000 0.000 0.000 internals.py:3622(<listcomp>)
1 0.000 0.000 0.000 0.000 internals.py:3829(_consolidate_inplace)
2 0.000 0.000 0.006 0.003 internals.py:714(to_native_types)
5 0.000 0.000 0.001 0.000 missing.py:123(_isna_ndarraylike)
5 0.000 0.000 0.001 0.000 missing.py:26(isna)
5 0.000 0.000 0.001 0.000 missing.py:51(_isna_new)
1 0.000 0.000 0.000 0.000 numeric.py:424(asarray)
3 0.000 0.000 0.000 0.000 numeric.py:621(require)
6 0.000 0.000 0.000 0.000 numeric.py:692(<genexpr>)
1 0.000 0.000 0.000 0.000 posixpath.py:230(expanduser)
2 0.000 0.000 0.000 0.000 range.py:119(_simple_new)
1 0.000 0.000 0.000 0.000 range.py:157(_data)
6 0.000 0.000 0.000 0.000 range.py:224(dtype)
5 0.000 0.000 0.000 0.000 range.py:469(__len__)
2 0.000 0.000 0.000 0.000 range.py:479(__getitem__)
2 0.000 0.000 0.000 0.000 range.py:56(__new__)
2 0.000 0.000 0.000 0.000 six.py:184(find_module)
2 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x5581996fad40}
1 0.000 0.000 0.000 0.000 {built-in method _csv.writer}
7 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock}
1 0.000 0.000 0.000 0.000 {built-in method _imp.is_builtin}
1 0.000 0.000 0.000 0.000 {built-in method _imp.is_frozen}
7 0.000 0.000 0.000 0.000 {built-in method _imp.release_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
2 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
1 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
27 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
21 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
102 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
6 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
2 0.000 0.000 0.000 0.000 {built-in method builtins.iter}
26/24 0.000 0.000 0.000 0.000 {built-in method builtins.len}
5 0.000 0.000 0.000 0.000 {built-in method builtins.max}
2 0.000 0.000 0.000 0.000 {built-in method builtins.min}
1 0.000 0.000 0.000 0.000 {built-in method builtins.sum}
1 0.000 0.000 0.000 0.000 {built-in method io.open}
4 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.arange}
6 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.array}
4 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty}
4 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.is_integer}
7 0.000 0.000 0.000 0.000 {built-in method pandas._libs.lib.isscalar}
1 0.000 0.000 0.000 0.000 {built-in method posix.fspath}
1 0.000 0.000 0.000 0.000 {built-in method posix.getcwd}
4 0.000 0.000 0.000 0.000 {built-in method posix.stat}
1 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
3 0.007 0.002 0.007 0.002 {method 'astype' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'close' of '_io.TextIOWrapper' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.000 0.000 0.000 0.000 {method 'fill' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
3 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
20 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
2 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
5 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}
40 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
1 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
3 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
4 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'writerow' of '_csv.writer' objects}
2 0.000 0.000 0.000 0.000 {pandas._libs.lib.isnaobj}
1 196.850 196.850 196.850 196.850 {pandas._libs.lib.write_csv_rows}