Question

为了在PostgreSQL中使用COPY（在我的情况下，来自csv文件）函数，我需要先创建目标表。

现在，如果我的表有60列，例如，手动编写它会感到奇怪和低效：

CREATE TABLE table_name(
   column1 datatype,
   column2 datatype,
   column3 datatype,
   .....
   column60 datatype

使用PostgreSQL的人 - 你如何解决这个问题？

Answer 1

我通常使用file_fdw扩展程序从CSV文件中读取数据。

但遗憾的是，file_fdw在解决诸如从具有多列的CSV文件中读取等任务时并不那么方便/灵活。 CREATE TABLE可以使用任意数量的列，但如果它与CSV文件不对应，则在执行SELECT时会失败。因此，显式创建表的问题仍然存在。但是，有可能解决它。

这是蛮力的方法，除了Postgres之外不需要任何东西。该函数用PL / pgSQL编写，尝试创建一个包含一列的表，并尝试从中SELECT。如果失败，它会丢弃表并再次尝试，但有2列。依此类推，直到SELECT可以。所有列都是text类型 - 这是一个非常有限的限制，但它仍然解决了使用准备SELECT表而不是手动工作的任务。

create or replace function autocreate_table_to_read_csv(
  fdw_server text,
  csv text,
  table_name text,
  max_columns_num int default 100
) returns void as $$
declare
  i int;
  sql text;
  rec record;
begin
  execute format('drop foreign table if exists %I', table_name);
  for i in 1..max_columns_num loop
    begin
      select into sql
       format('create foreign table %I (', table_name)
          || string_agg('col' || n::text || ' text', ', ')
          || format(
            e') server %I options ( filename \'%s\', format \'csv\' );',
            fdw_server,
            csv
          )
      from generate_series(1, i) as g(n);
      raise debug 'SQL: %', sql;
      execute sql;
      execute format('select * from %I limit 1;', table_name) into rec;
      -- looks OK, so the number of columns corresponds to the first row of CSV file
      raise info 'Table % created with % column(s). SQL: %', table_name, i, sql;
      exit;
    exception when others then
      raise debug 'CSV has more than % column(s), making another attempt...', i;
    end;
  end loop;
end;
$$ language plpgsql;

找到适当数量的列后，会报告相关内容（请参阅raise info）。

要查看更多详细信息，请在使用此功能之前运行set client_min_messages to debug;。

使用示例：

test=# create server csv_import foreign data wrapper file_fdw;
CREATE SERVER

test=# set client_min_messages to debug;
SET

test=# select autocreate_table_to_read_csv('csv_import', '/home/nikolay/tmp/sample.csv', 'readcsv');
NOTICE:  foreign table "readcsv" does not exist, skipping
DEBUG:  SQL: create foreign table readcsv (col1 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
DEBUG:  CSV has more than 1 column(s), making another attempt...
DEBUG:  SQL: create foreign table readcsv (col1 text, col2 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
DEBUG:  CSV has more than 2 column(s), making another attempt...
DEBUG:  SQL: create foreign table readcsv (col1 text, col2 text, col3 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
INFO:  Table readcsv created with 3 column(s). SQL: create foreign table readcsv (col1 text, col2 text, col3 text) server csv_import options ( filename '/home/nikolay/tmp/sample.csv', format 'csv' );
 autocreate_table_to_read_csv
------------------------------

(1 row)

test=# select * from readcsv limit 2;
 col1  | col2  | col3
-------+-------+-------
 1313  | xvcv  | 22
 fvbvb | 2434  | 4344
(2 rows)

更新：针对COPY .. FROM发现非常相似（但没有“暴力”，需要明确指定CSV文件中的列数）方法的实现：file_fdw

P.S。实际上，这对于提高Postgres的COPY .. FROM和postgres_fdw功能是一项非常好的任务，使其更加灵活 - 例如，对于IMPORT FOREIGN SCHEMA，有一个非常方便的<?xml version='1.0' encoding='UTF-8'?> <osm version="0.6" generator="osmfilter 1.4.2"> <way id="10053349"> <nd ref="4534884733"/> <nd ref="4534884725"/> <nd ref="4534884748"/> <nd ref="82608659"/> <nd ref="82608658"/> <nd ref="639108039"/> <nd ref="3451083060"/> <nd ref="345553449"/> <nd ref="345553447"/> <nd ref="345553431"/> <nd ref="3451083057"/> <nd ref="345553432"/> <nd ref="345553433"/> <nd ref="345553434"/> <nd ref="345553435"/> <nd ref="3451083068"/> <nd ref="345553436"/> <nd ref="29564147"/> <nd ref="345553437"/> <nd ref="345553438"/> <nd ref="3451083079"/> <nd ref="345553439"/> <nd ref="3451083082"/> <nd ref="345553440"/> <nd ref="1326631485"/> <nd ref="82608663"/> <nd ref="82608662"/> <nd ref="4534884733"/> <tag k="addr:housenumber" v="21"/> <tag k="addr:street" v="Arcisstraße"/> <tag k="amenity" v="university"/> <tag k="building" v="yes"/> <tag k="name" v="1"/> <tag k="wheelchair" v="yes"/> </way> </osm>命令，允许非常快速地定义远程（“外部”）对象，只需一行 - 它节省了很多努力。对于CSV dta有类似的东西会很棒。

在PostgreSQL中创建包含许多列的表

1 个答案: