Question

我正在努力将多个Berekeley Earth netCDF文件转换为CSV或其他表格格式。我意识到以前也曾提出过类似的问题，但我无法应用遇到的任何解决方案。

ncdump似乎没有生成实际的CSV文件。我找不到任何说明。
我尝试使用pandas将数据加载到xarray.to_dataframe()数据帧中，但是我的笔记本无法分配所需的内存。

In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: nc = xr.open_dataset('Complete_TAVG_Daily_EqualArea.nc')

In [4]: nc
Out[4]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, time: 50769)
Dimensions without coordinates: map_points, time
Data variables:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
    date_number  (time) float64 ...
    year         (time) float64 ...
    month        (time) float64 ...
    day          (time) float64 ...
    day_of_year  (time) float64 ...
    land_mask    (map_points) float64 ...

In [5]: df = nc.to_dataframe()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
(...)

MemoryError: Unable to allocate 532. MiB for an array with shape (279127962,) and data type int16

我尝试使用Panoply进行转换。 CSV导出似乎只能将单个变量（我想作为一列显示）导出到单行文件中。

我一定很想念东西。有人可以帮我吗？

Answer 1

您缺少的是netCDF是比CVS更复杂的格式。一个netCDF文件可以包含多个任意形状和大小的数组。 CSV文件只能包含最大2维的单个数组（如果长度相同，则只能包含一组一维数组）。因此，您不能简单地将任何netCDF文件转换为CSV。

让我们看看您提供的示例文件。我在这里用我的Xarray版本重复此信息，这似乎更加冗长...

In [16]: ds = xr.open_dataset('Complete_TAVG_EqualArea.nc')

In [17]: ds
Out[17]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, month_number: 12, time: 3240)
Coordinates:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
  * time         (time) float64 1.75e+03 1.75e+03 1.75e+03 ... 2.02e+03 2.02e+03
Dimensions without coordinates: map_points, month_number
Data variables:
    land_mask    (map_points) float64 ...
    temperature  (time, map_points) float32 ...
    climatology  (month_number, map_points) float32 ...
Attributes:
    Conventions:          Berkeley Earth Internal Convention (based on CF-1.5)
    title:                Native Format Berkeley Earth Surface Temperature An...
    history:              16-Jan-2020 06:51:38
    institution:          Berkeley Earth Surface Temperature Project
    source_file:          Complete_TAVG.50985s.20200116T064041.mat
    source_history:       13-Jan-2020 17:22:52
    source_data_version:  ca6f26341938dae0ea7dd619bce6f15e
    comment:              This file contains Berkeley Earth surface temperatu...

有三个数据变量（land_mask，温度，气候）以及三个坐标矢量（经度，纬度，时间）。也许您可以将坐标向量作为CSV文件的第一行和第一列，但是即使如此，这意味着每个netCDF文件至少需要三个单独的CSV文件。

例如，对于climatology数据帧，您可以按以下方式写入CVS：

In [31]: clim = ds['climatology']  

In [32]: clim.to_pandas().to_csv('clim.csv')

因此clim是一个xarray.DataFrame，原则上可以将其写入CSV文件。不幸的是xarray.DataFrame类没有to_csv方法。但是pandas.DataFrame类可以，因此我们首先将其转换为熊猫数据框。查看其参数文档here来调整生成的输出文件。

Answer 2

您可以使用CDO软件包套件将.nc转换为.csv。

示例代码（您需要编辑一些outputtab参数：

cdo -outputtab,date,lon,lat,value infile.nc | awk 'FNR==1{ row=$2","$3","$4","$5;print row  } FNR!=1{ row=$1","$2","$3","$4; print row}' > outfile.csv

将netCDF文件转换为CSV

2 个答案: