Question

我知道问题名称有点含糊不清。

我的目标是根据数据框中的2列+唯一值分配全局键列。

例如

https://domain.com/app1/

让Car = 01，Bike = 02，Plane = 03

我想要的全局密钥格式是[事故] [CountryCode] [UniqueValue]

唯一值是类似[Accident] [CountryCode]

的计数

所以如果Accident = Car和CountryCode = AFG并且它是第一次出现，那么全局密钥将是01AFG01

所需的数据框如下所示：

CountryCode | Accident
   AFG          Car
   AFG          Bike
   AFG          Car
   AFG          Plane
   USA          Car
   USA          Bike
   UK           Car

我尝试运行for循环以将Accident Number和CountryCode一起追加

例如：

CountryCode | Accident | GlobalKey
   AFG          Car        01AFG01
   AFG          Bike       02AFG01
   AFG          Car        01AFG02
   AFG          Plane      01AFG03
   USA          Car        01USA01
   USA          Bike       01USA02
   UK           Car        01UK01

此代码会根据我指定的值为我提供globalKey = [] for x in range(0,6): string = df.iloc[x, 1] string2 = df.iloc[x, 2] if string2 == 'Car': number = '01' elif string2 == 'Bike': number = '02' elif string2 == 'Plane': number = '03' #Concat the number of accident and Country Code subKey = number + string #Append to the list globalKey.append(subKey)，01AFG之类的内容。但我希望通过计算02AFG和CountryCode相似的时间来指定唯一值。

我坚持使用上面的代码。我认为在Pandas中使用map函数应该有更好的方法。

感谢您的帮助！非常感谢！

Answer 1

您可以尝试使用cumcount通过多个步骤实现此目标，例如：

In [1]: df = pd.DataFrame({'Country':['AFG','AFG','AFG','AFG','USA','USA','UK'], 'Accident':['Car','Bike','Car','Plane','Car','Bike','Car']})

In [2]: df
Out[2]: 
  Accident Country
0      Car     AFG
1     Bike     AFG
2      Car     AFG
3    Plane     AFG
4      Car     USA
5     Bike     USA
6      Car      UK

## Create a column to keep incremental values for `Country`
In [3]: df['cumcount'] = df.groupby('Country').cumcount()

In [4]: df
Out[4]: 
  Accident Country  cumcount
0      Car     AFG         0
1     Bike     AFG         1
2      Car     AFG         2
3    Plane     AFG         3
4      Car     USA         0
5     Bike     USA         1
6      Car      UK         0

## Create a column to keep incremental values for combination of `Country`,`Accident`
In [5]: df['cumcount_type'] = df.groupby(['Country','Accident']).cumcount()

In [6]: df
Out[6]: 
  Accident Country  cumcount  cumcount_type
0      Car     AFG         0              0
1     Bike     AFG         1              0
2      Car     AFG         2              1
3    Plane     AFG         3              0
4      Car     USA         0              0
5     Bike     USA         1              0
6      Car      UK         0              0

从那时起，您可以连接cumcount，cumcount_type和Country的值，以实现您所追求的目标。

根据您是否要从0或1开始计数，您可能希望将1添加到不同计数下的每个值中。

我希望这会有所帮助。

Answer 2

创建subKey后，我们可以对数据帧进行排序并计算夫妻的出现次数。首先让我们重置索引（以存储原始订单）

df = df.reset_index()

然后按subKey排序并计算

df = df.sort_values(by='subKey')
df['newnumber'] = 1

for ind in range(1, len(df)): #start by 1 because first row is always 1
    if df.loc[ind, 'subKey'] == df.loc[ind - 1, 'subKey']:
        df.loc[ind, 'newnumber'] = df.loc[ind - 1, 'newnumber'] + 1

最后在GlobalKey函数的帮助下创建zfill，按index重新排序：

df['GlobalKey'] = df.apply(lambda x: x['subKey'] + str(x['new_number']).zfill(2), 1)
df = df.sort_values(by='index').drop('index', 1).reset_index(drop=True)

Answer 3

首先，如果你能提供帮助，请不要使用for循环。例如，您可以使用以下代码进行事故代码映射：

df['AccidentCode'] = df['Accident'].map({'Car': '01', 'Bike': '02', 'Plane': '03'})

要使用Thanos has shown how to do获取唯一代码GroupBy.cumcount：

df['CA_ID'] = df.groupby(['CountryCode', 'Accident']).cumcount() + 1

然后将它们组合成一个唯一的密钥：

df['NewKey'] = df['AccidentCode'] + df['CountryCode'] + df['CA_ID'].map('{:0>2}'.format)

给出：

  CountryCode Accident GlobalKey AccidentCode  CA_ID   NewKey
0         AFG      Car   01AFG01           01      1  01AFG01
1         AFG     Bike   02AFG01           02      1  02AFG01
2         AFG      Car   01AFG02           01      2  01AFG02
3         AFG    Plane   01AFG03           03      1  03AFG01
4         USA      Car   01USA01           01      1  01USA01
5         USA     Bike   01USA02           02      1  02USA01
6          UK      Car    01UK01           01      1   01UK01

Answer 4

我没有任何熊猫的经历，所以这个答案可能不是你想要的。话虽如此，如果您拥有的数据真的那么简单（几个国家，很少有事故类型），您是否考虑过将每个国家/地区的事故组合存储在自己的价值中？

因此，当您遍历输入时，只需递增该国家/地区组合的计数器，然后在末尾读取这些计数器以生成GlobalKeys。

如果除了全局密钥之外还有其他数据存储，则将国家/地区组合存储为列表，并一次一个地读取它们以生成GlobalKeys。

根据不同的列值分配唯一值

4 个答案: