I'm trying to migrate some poorly formed data into a database. The data comes from a CSV, and is first loaded into a staging table of all varchar columns (as I cannot enforce type safety at this stage).
The data might look like
COL1 | COL2 | COL3
Name 1 | |
2/11/16 | $350 | $230
2/12/16 | $420 | $387
2/13/16 | $435 | $727
Name 2 | |
2/11/16 | $121 | $144
2/12/16 | $243 | $658
2/13/16 | $453 | $214
The first colum is a mixture of company names as pseudo-headers, and dates for which colum 2 and 3 data is relevant. I'd like to start transforming the data by creating a 'Brand' column - where 'StoreBrand' is the value of Col1 if Col2 is NULL, or the previous row's StoreBrand otherwise. Comething like:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | Name 1
2/13/16 | $435 | $727 | Name 1
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | Name 2
2/13/16 | $453 | $214 | Name 2
I wrote this:
SELECT
t.*,
CASE
WHEN t.COL2 IS NULL THEN COL1
ELSE LAG(StoreBrand) OVER ()
END AS StoreBrand
FROM
(
SELECT
ROW_NUMBER() OVER () AS i,
*
FROM
Staging_Data
) t;
But the database (postgres in this case, but we're considering alternatives so the most diverse answer is preferred) chokes on LAG(StoreBrand) because that's the derived column I'm creating. Invoking LAG(Col1) only populates the first row's real data:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | 2/11/16
2/13/16 | $435 | $727 | 2/12/16
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | 2/11/16
2/13/16 | $453 | $214 | 2/12/16
My goal would be a StoreBrand column which is the first value of COL1 for all date values before the next brand name:
COL1 | COL2 | COL3 | StoreBrand
Name 1 | | | Name 1
2/11/16 | $350 | $230 | Name 1
2/12/16 | $420 | $387 | Name 1
2/13/16 | $435 | $727 | Name 1
Name 2 | | | Name 2
2/11/16 | $121 | $144 | Name 2
2/12/16 | $243 | $658 | Name 2
2/13/16 | $453 | $214 | Name 2
The value of StoreBrand when Col2 and Col3 are null is inconsequential - that row will be dropped as part of the conversion process. The important thing is associating the data rows (i.e. those with dates) with their brand.
Is there a way to reference the previous value for the column that I'm missing?
答案 0 :(得分:1)
编辑通过搜索引擎找到此问题的人:
诀窍是使用WITH
,允许在多个地方使用临时结果(link)。
我认为这样做你想要的并同时丢弃空行(如果你愿意)。我们基本上在我们目前正在查看的行之前选择所有品牌,如果它与当前行之间没有“品牌行”,那么我们接受它。
WITH t AS
(SELECT
ROW_NUMBER() OVER () AS i,
*
FROM
Staging_Data
)
SELECT
a.COL1,
a.COL2,
a.COL3,
(SELECT b.COL1 FROM t b WHERE b.COL2 IS NULL AND b.i <= a.i AND NOT EXISTS(
SELECT * FROM t c WHERE c.COL2 IS NULL AND c.i <= a.i AND c.i > b.i)
) StoreBrand
FROM
t a
WHERE -- I don't think you need those rows? Otherwise remove it.
a.COL2 IS NOT NULL
这可能有点令人困惑。 t
是我们定义with
您的查询的临时表。 a
,b
和c
是t
的别名。FROM t AS a
,select * from user where user.id = (select id from user_matching where id = user_matching_id)
和var styles = StyleSheet.create({
description: {
marginBottom: 20,
fontSize: 18,
textAlign: 'center',
color: '#656565'
}
image: {
width: 217,
height: 138
},
input: {
// your style
}
});
是public function questions()
{
return $this->hasMany('App\UserQuestion');
}
的别名。我们也可以写$users = User::all();
$users->each(function ($user) {
$questions = User::find($user->id)->questions;
});
来使其更加明显。
答案 1 :(得分:0)
我想我明白你想要什么。从技术上讲,您需要ignore nulls
上的lag()
选项,因此它看起来像这样:
select lag(case when col1 not like '%/%/%' then col1 end ignore nulls) over (order by linenumber) as brandname
唯一的问题是什么? Postgres不支持ignore nulls
。
但是,你可以用子查询做同样的事情。我们的想法是为每个组分配一个分组标识符。这是有效品牌名称的累积计数。然后一个简单的max()
聚合工作:
select t.*,
max(case when col1 not like '%/%/%' then col1 end) over (partition by grp) as brand
from (select t.*,
sum(case when col1 not like '%/%/%' then 1 end) over
(order by linenumber) as grp
from t
);