我有一个问题,在分割到一定数量的cols后,将一个DataFrame分解为单独的行,而不是以逗号分隔的列表。我正试图在Pandas中实现这一点,但如果使用原始SQL(我试过并放弃)这是可能的话,那么这将是一个理想的解决方案。
示例数据
Reference Surname Forename CurrentPostCode PreviousPostCodes
1 Smith John WA1 2LA WA2 HG5, LN4 6XS
2 Jones Jack NA1 2NE None
3 Potter Harry LI8 0NX None
4 Wane Bruce HE27 4PR HE5 9PR
5 Finn Grahame B26 7UP B15 6UR, B22 9JK, B13 3YT
我想将 PreviousPostCodes 列拆分为两列 PPC1 和 PPC2 ,如果数组/逗号分隔列表的列数超过其中包含2个项目(在参考文献5的情况下),需要拆分前两个并在下方添加一行,并使用 B13 3YT
填充 PPC1所需输出
Reference Surname Forename CurrentPostCode PPC1 PPC2
1 Smith John WA1 2LA WA2 HG5 LN4 6XS
2 Jones Jack NA1 2NE None None
3 Potter Harry LI8 0NX None None
4 Wane Bruce HE27 4PR HE5 9PR None
5 Finn Grahame B26 7UP B15 6UR B22 9JK
5 Finn Grahame B26 7UP B13 3YT None
我希望这是有道理的,我可以拆分列表,但我得到n列,我想将其限制为最大大小为2,如果超过2则溢出到新行。没有限制数据中先前邮政编码的数量,如果逗号分隔列表中有5个,则需要将该行分解为3个新行。
由于
答案 0 :(得分:0)
df[['PPC1','PPC2']] = df.pop('PreviousPostCodes').str.split(',\s*', n=1, expand=True)
df['PPC2'] = df['PPC2'].fillna('').str.split(',\s*', expand=False)
的产率:
In [173]: df
Out[173]:
Reference Surname Forename CurrentPostCode PPC1 PPC2
0 1 Smith John WA1 2LA WA2 HG5 [LN4 6XS]
1 2 Jones Jack NA1 2NE NaN []
2 3 Potter Harry LI8 0NX NaN []
3 4 Wane Bruce HE27 4PR HE5 9PR []
4 5 Finn Grahame B26 7UP B15 6UR [B22 9JK, B13 3YT]
现在我们可以使用explode()
function:
In [174]: explode(df, lst_cols='PPC2')
Out[174]:
Reference Surname Forename CurrentPostCode PPC1 PPC2
0 1 Smith John WA1 2LA WA2 HG5 LN4 6XS
1 2 Jones Jack NA1 2NE NaN
2 3 Potter Harry LI8 0NX NaN
3 4 Wane Bruce HE27 4PR HE5 9PR
4 5 Finn Grahame B26 7UP B15 6UR B22 9JK
5 5 Finn Grahame B26 7UP B15 6UR B13 3YT
答案 1 :(得分:0)
试试这个可以解释你的Sql脚本.Below是样本数据
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
;With cte(Reference , Surname, Forename , CurrentPostCode, PreviousPostCodes)
AS
(
SELECT 1,'Smith' ,'John' , 'WA1 2LA' ,'WA2 HG5, LN4 6XS,B13 3YT,AA18 3YT,YT783 3YT' UNION ALL
SELECT 2,'Jones' ,'Jack' , 'NA1 2NE' ,'None' UNION ALL
SELECT 3,'Potter','Harry' , 'LI8 0NX' ,'None' UNION ALL
SELECT 4,'Wane' ,'Bruce' , 'HE27 4PR' ,'HE5 9PR,B13 3YT,RT4 YT5' UNION ALL
SELECT 5,'Finn' ,'Grahame', 'B26 7UP' ,'B15 6UR, B22 9JK, B13 3YT'
)
SELECT * INTO #temp FROM cte
SELECT * FROM #temp
通过使用动态sql我们得到n个列,这取决于prviousPostCode列,其中数据用逗号分隔,n列将被创建,因为旧的邮政编码取决于n个逗号
--To get the number of columns to be divided dynamically
DECLARE @ColumnsDivideCnt INT
,@Dyncol nvarchar(max)
,@Sql nvarchar(max)
;WITH cte
AS
(
SELECT 0 As Rn, CHARINDEX(',',PreviousPostCodes+',') AS Pos ,PreviousPostCodes FROM #temp
UNION ALL
SELECT Pos+1,CHARINDEX(',',PreviousPostCodes+',',Pos+1) ,PreviousPostCodes
FROM cte
WHERE Pos >0
)
SELECT @ColumnsDivideCnt=MAX(ColumnToGet) FROm
(
SELECT PreviousPostCodes, Pos,ROW_NUMBER()OVER(Partition by PreviousPostCodes Order by PreviousPostCodes) AS ColumnToGet FROM cte
WHERE Pos >0
GROUP BY PreviousPostCodes,Pos
)dt
--Get the column names dynamically
;WIth cte2
AS
(
SELECT 1 AS Rn
UNION ALL
SELECT Rn+1
From cte2
WHERE Rn<@ColumnsDivideCnt
)
SELECT @Dyncol=STUFF((SELECT ', ' + ReqCol FROM
(
SELECT 'ISNULL(Split.a.value('+'''/S['+CAST(Rn AS VARCHAR(2))+']'+''''+','+'''NVARCHAR(1000)'''+'),''None'') As [PPC'+CAST(Rn AS VARCHAR(2))+']' AS ReqCol FROM cte2
)Dt
FOR XML PATH ('')),1,1,'')
SET @Sql='SELECT DISTINCT
Reference
,Surname
,Forename
,CurrentPostCode
,'+@Dyncol+'
FROM (
SELECT Reference,Surname,Forename,CurrentPostCode,
CAST(''<S>''+REPLACE(PreviousPostCodes,'','',''</S><S>'')+''</S>'' AS XML)AS PreviousPostCodes
FROM #temp
) AS A
CROSS APPLY PreviousPostCodes.nodes(''S'') AS Split(a)
'
PRINT @Sql
EXEC (@Sql)
运行Dynamic sql脚本之前的结果
Reference Surname Forename CurrentPostCode PreviousPostCodes
-----------------------------------------------------------------------------------------------
1 Smith John WA1 2LA WA2 HG5, LN4 6XS,B13 3YT,AA18 3YT,YT783 3YT
2 Jones Jack NA1 2NE None
3 Potter Harry LI8 0NX None
4 Wane Bruce HE27 4PR HE5 9PR,B13 3YT,RT4 YT5
5 Finn Grahame B26 7UP B15 6UR, B22 9JK, B13 3YT
结果AfterDynamic sql脚本运行
Reference Surname Forename CurrentPostCode PPC1 PPC2 PPC3 PPC4
--------------------------------------------------------------------------------------------------
1 Smith John WA1 2LA WA2 HG5 LN4 6XS B13 3YT AA18 3YT
2 Jones Jack NA1 2NE None None None None
3 Potter Harry LI8 0NX None None None None
4 Wane Bruce HE27 4PR HE5 9PR B13 3YT RT4 YT5 None
5 Finn Grahame B26 7UP B15 6UR B22 9JK B13 3YT None