我正在使用pyarrow.parquet从S3中提取一个实木复合地板文件。
基础数据为10,000行,并具有以下数据类型:
Columns:
- BIGINT
- Arbitrary String
- INT
- BIGINT
- DATETIME
- DATETIME
- BOOLEAN
- BOOLEAN
- FACTOR (5 LEVELS)
这是我用来提取数据的代码
import pandas
import pyarrow.parquet as pq
pandas_dataframe = pq.ParquetDataset('[path]', filesystem=s3).read_pandas().to_pandas()
这是显示我的错误
UnicodeEncodeErrorTraceback (most recent call last)
<ipython-input-19-f38ae835e530> in <module>()
----> 1 pandas_dataframe.to_csv("myfilename.csv")
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/core/frame.pyc in to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, tupleize_cols, date_format, doublequote, escapechar, decimal)
1522 doublequote=doublequote,
1523 escapechar=escapechar, decimal=decimal)
-> 1524 formatter.save()
1525
1526 if path_or_buf is None:
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/formats/format.pyc in save(self)
1650 self.writer = UnicodeWriter(f, **writer_kwargs)
1651
-> 1652 self._save()
1653
1654 finally:
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/formats/format.pyc in _save(self)
1752 break
1753
-> 1754 self._save_chunk(start_i, end_i)
1755
1756 def _save_chunk(self, start_i, end_i):
/opt/conda/envs/python2/lib/python2.7/site-packages/pandas/io/formats/format.pyc in _save_chunk(self, start_i, end_i)
1778 quoting=self.quoting)
1779
-> 1780 lib.write_csv_rows(self.data, ix, self.nlevels, self.cols, self.writer)
1781
1782
pandas/_libs/lib.pyx in pandas._libs.lib.write_csv_rows()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)