我目前正在处理员工福利数据。但是,电子表格数据完全是一团糟。我想将其格式化为易于捕获的信息。 当前的格式如下:
Relationship EmployeeName BenefitCode BenefitOption Name
Alice DEN EEC
CHL Alice DEN EEC John
SPS Alice MED Lee
Lily VIS
SPS Lily VIS Tom
我想这样转移它:
Relationship Name MED DEN VIS
Employee Alice EEC
CHL John EEC
SPS Lee MED
Employee Lily VIS
SPS Tom VIS
我尝试按名称和BenefitCode对数据进行分组,但是我对此感到非常困惑。
我的代码如下:
SELECT RelationshipCode, EmployeeName,
MAX(IF(BenefitCode = "DEN", BenefitOptionCode , NULL)) AS DEN,
MAX(IF(BenefitCode = "MED", BenefitOptionCode , NULL)) AS MEDICAL,
MAX(IF(BenefitCode = "VIS", BenefitOptionCode , NULL)) AS VISION
FROM `TableXXX`
WHERE RelationshipCode = 'Employee'
GROUP BY EmployeeName, RelationshipCode
但是,失去与雇员的亲属关系似乎不是一个好主意。 谁能告诉我如何将垂直数据转换为水平数据?还是您有解决这个问题的好主意?
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
-- ORDER BY Name, Relationship
如果要应用于您的问题的样本数据-结果为
Row EmployeeName Relationship Name DEN MEDICAL VISION
1 Alice Self Alice EEC null null
2 Alice CHL John EEC null null
3 Alice SPS Lee null MED null
4 Lily Self Lily null null VIS
5 Lily SPS Tom null null VIS
另一个选择是将扩展版本扩展到“分层”
#standardSQL
SELECT EmployeeName,
ARRAY_AGG(STRUCT(Name, Relationship, DEN, MEDICAL, VISION)) benefits
FROM (
SELECT
EmployeeName,
IF(Relationship IS NULL, 'Self', Relationship) Relationship,
IFNULL(Name, EmployeeName) Name,
MAX(IF(BenefitCode = 'DEN', IFNULL(BenefitOption, BenefitCode), NULL)) AS DEN,
MAX(IF(BenefitCode = 'MED', IFNULL(BenefitOption, BenefitCode), NULL)) AS MEDICAL,
MAX(IF(BenefitCode = 'VIS', IFNULL(BenefitOption, BenefitCode), NULL)) AS VISION
FROM `project.dataset.table`
GROUP BY Name, EmployeeName, Relationship
)
GROUP BY EmployeeName
-- ORDER BY EmployeeName
在这种情况下,结果将是
Row EmployeeName benefits.Name benefits.Relationship benefits.DEN benefits.MEDICAL benefits.VISION
1 Alice Alice Self EEC null null
John CHL EEC null null
Lee SPS null MED null
2 Lily Lily Self null null VIS
Tom SPS null null VIS
答案 1 :(得分:0)
我可能会将其组织成CTE,使每个列(或概念)成为自己的逻辑CTE。
with people as (
select distinct EmployeeName as person from <dataset>.<table> union distinct
select distinct Name as person from <dataset>.table
),
med as (
-- select people with MED columns
),
den as (
-- select people with DEN columns
),
... (etc)
joined as (
select * from people
left join med using(person)
left join den using(person)
)
select * from joined
对于这种情况,我的一般建议是从您了解的内容开始(例如我从MED和DEN开始的方式)。这些简单的项目完成后,您将转到更复杂或需要假设的项目。将它们分解为CTE块有助于封装每个想法。
我们显然也不知道您的数据,甚至不知道这是否是一项实际任务,但是您可能需要注意一些警告,需要更详细的逻辑(同名的人,多代人的关系等等)>