我有数据:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0" xmlns:asmv3="urn:schemas-microsoft-com:asm.v3">
<description>eclipse</description>
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v2">
<security>
<requestedPrivileges>
<requestedExecutionLevel xmlns:ms_asmv3="urn:schemas-microsoft-com:asm.v3" level="asInvoker" ms_asmv3:uiAccess="false">
</requestedExecutionLevel>
</requestedPrivileges>
</security>
</trustInfo>
<asmv3:application>
<asmv3:windowsSettings xmlns="http://schemas.microsoft.com/SMI/2005/WindowsSettings">
<ms_windowsSettings:dpiAware xmlns:ms_windowsSettings="http://schemas.microsoft.com/SMI/2005/WindowsSettings">false</ms_windowsSettings:dpiAware>
</asmv3:windowsSettings>
</asmv3:application>
</assembly>
我使用data = [
(1, 'Shirt', 2),
(1, 'Pants', 3),
(2, 'Top', 2),
(2, 'Shirt', 1),
(2, 'T-Shirt', 4),
(3, 'Shirt', 3),
(3, 'T-Shirt', 2),
(4, 'Top', 3),
(4, 'Pants', 3),
(4, 'T-Shirt', 3),
]
进行转换:
pandas
来自df = pd.DataFrame(data, columns=['unique_id', 'category_product', 'count'])
的和矩阵是:
df
但是我需要从0开始更改 unique_id category_product count
0 11 Shirt 2
1 11 Pants 3
2 24 Top 2
3 24 Shirt 1
4 24 T-Shirt 4
5 36 Shirt 3
6 36 T-Shirt 2
7 48 Top 3
8 48 Pants 3
9 48 T-Shirt 3
,并按照看到的顺序增加,结果如下:
unique_id
我该怎么做?
答案 0 :(得分:1)
可能有更简单的方法,但这里有一个;
df.unique_id = (df.unique_id.diff() != 0).cumsum() - 1
基本上它只是将每一行与前一行进行比较,如果差异为!= 0则将输出值增加1.最后的-1是补偿前导NaN(第一行没有任何内容)差异)