我正在尝试使用int numReduceTasks = 5;
Configuration conf = new Configuration();
Job job = new Job(conf, "DictionarySorter");
job.setJarByClass(SampleEMR.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setPartitionerClass(TotalOrderPartitioner.class);
job.setNumReduceTasks(numReduceTasks);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, input);
FileOutputFormat.setOutputPath(job, new Path(output
+ ".dictionary.sorted." + getCurrentDateTime()));
job.setPartitionerClass(TotalOrderPartitioner.class);
Path inputDir = new Path("/others/partitions");
Path partitionFile = new Path(inputDir, "partitioning");
TotalOrderPartitioner.setPartitionFile(job.getConfiguration(),
partitionFile);
double pcnt = 1.0;
int numSamples = numReduceTasks;
int maxSplits = numReduceTasks - 1;
if (0 >= maxSplits)
maxSplits = Integer.MAX_VALUE;
InputSampler.Sampler<LongWritable, Text> sampler = new InputSampler.RandomSampler<LongWritable, Text>(pcnt,
numSamples, maxSplits);
InputSampler.writePartitionFile(job, sampler);
job.waitForCompletion(true);
语句将列从一个表复制到另一个表中,但我收到一条错误消息:
INSERT INTO ... SELECT
gis=> INSERT INTO places (SELECT 0 AS osm_id, 0 AS code, 'country' AS fclass, pop_est::numeric(10,0) AS population, name, geom FROM countries);
ERROR: invalid input syntax for integer: "country"
LINE 1: ...NSERT INTO places (SELECT 0 AS osm_id, 0 AS code, 'country' ...
语句本身就像我期望的那样给出结果:
SELECT
但不知怎的,看起来很混乱,认为gis=> SELECT 0 AS osm_id, 0 AS code, 'country' AS fclass, pop_est::numeric(10,0) AS population, name, geom FROM countries LIMIT 1;
osm_id | code | fclass | population | name | geom
--------+------+---------+------------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0 | 0 | country | 103065 | Aruba | 0106000000010000000103000000010000000A000000333333338B7951C0C8CCCCCC6CE7284033333333537951C03033333393D82840CCCCCCCC4C7C51C06066666686E0284000000000448051C00000000040002940333333333B8451C0C8CCCCCC0C18294099999999418351C030333333B3312940333333333F8251C0C8CCCCCC6C3A294000000000487E51C000000000A0222940333333335B7A51C00000000000F62840333333338B7951C0C8CCCCCC6CE72840
(1 row)
列应该是一个整数,实际上它实际上是fclass
character varying(20)
我已尝试将所有列投射到目标表所需的确切类型,但这似乎没有任何效果。
我可以在网上找到的这个错误消息的所有其他实例似乎是人们试图将空字符串用作整数,这在这里是不相关的,因为我选择一个常量字符串为gis=> \d+ places
Unlogged table "public.places"
Column | Type | Modifiers | Storage | Stats target | Description
------------+------------------------+------------------------------------------------------+----------+--------------+-------------
gid | integer | not null default nextval('places_gid_seq'::regclass) | plain | |
osm_id | bigint | | plain | |
code | smallint | | plain | |
fclass | character varying(20) | | extended | |
population | numeric(10,0) | | main | |
name | character varying(100) | | extended | |
geom | geometry | | main | |
Indexes:
"places_pkey" PRIMARY KEY, btree (gid)
"places_geom" gist (geom)
。< / p>
答案 0 :(得分:1)
您需要指定要插入的列名称:
INSERT INTO places (osm_id, code, fclass, population, name, geom) SELECT ...
如果不单独指定它们,则假定要插入所有列 - 包括gid
,您希望自动填充这些列。因此,'country'
实际上是通过您当前的code
语句插入INSERT
。