我获得了一个Postgres数据集,其中包含约2000行,其中一列是一连串的年份,而在这些年份中,维护活动是这样的:
id maintenance
1 2012- Filled holes, Painted, 2017- Nailed
3 2018- Booger removal
2 2012- Painted, 2017- Filled holes, 2018- Wallpaper
我正试图找到一种方法来将这些数据分为如下所示的列:
id year_1 year_1_maint year_2 year_2_maint...
1 2012 Filled holes, Painted 2017 Nailed
2 2018 Booger removal
我正在考虑一种可能的解决方案,该方法使用类似以下的内容(除了因为将其用作定界符,因此它删除了年份):
select regexp_split_to_array(maintenance, '\d{4}')
from maintenance_database
where maintenance is not null;
我可以使用以下内容找到多少列:
select max(array_length(regexp_split_to_array(maintenance, '\d{4}'),1))
from maintenance_database
where maintenance is not null;
但这就是我在试图获取一系列更新查询或单个查询以所需格式格式化数据的查询时遇到的问题。
答案 0 :(得分:0)
首先想到的是在年份后附加“特殊”字符以用作分隔符:
db=# with maintenance_database(maintenance) as (values('2012- Painted, 2017- Filled holes, 2018- Wallpaper'))
select regexp_split_to_array(regexp_replace(maintenance, '(\d{4})', '\1'||chr(1),'g'),chr(1)) from maintenance_database;
regexp_split_to_array
---------------------------------------------------------------
{2012,"- Painted, 2017","- Filled holes, 2018","- Wallpaper"}
(1 row)
在这里,我将regexp_replace(maintenance, '(\d{4})', '\1'||chr(1),'g')
的维护准备为:
db=# with maintenance_database(maintenance) as (values('2012- Painted, 2017- Filled holes, 2018- Wallpaper'))
select regexp_replace(maintenance, '(\d{4})', '\1'||chr(1),'g') from maintenance_database;
regexp_replace
----------------------------------------------------------------
2012\x01- Painted, 2017\x01- Filled holes, 2018\x01- Wallpaper
(1 row)
进一步使用您的代码...
自然要用年作为值,所以还需要加上人工定界符:
db=# with maintenance_database(maintenance) as (values('2012- Painted, 2017- Filled holes, 2018- Wallpaper'))
select regexp_split_to_array(regexp_replace(maintenance, '(\d{4})', chr(1)||'\1'||chr(1),'g'),chr(1)) from maintenance_database;
regexp_split_to_array
--------------------------------------------------------------------
{"",2012,"- Painted, ",2017,"- Filled holes, ",2018,"- Wallpaper"}
(1 row)