我想为df2中的数据串行分配一个特定的ID,并基于此ID,我希望将其在df1中的所有出现都转换为ID。我编写的代码需要花费很多时间才能执行。还有其他办法吗?
for i in range (0,35261):
for j in range (0,54793):
if (df2.V_ID[i] == df.V_ID[j]):
df.V_ID[j] = i
df的样本数据:
time IP1 IP2 GETVIDEO V_ID IP3
0 2008-03-11 17:28:17 63.22.65.77 205.181.173.92 GETVIDEO ORDhCi6JQaY&signature 254.212.25.169
1 2008-03-11 17:28:20 63.22.65.94 35.139.184.95 GETVIDEO xEcFchOvj4Y&signature 254.212.19.255
2 2008-03-11 17:28:22 63.22.65.73 35.139.176.183 GETVIDEO z-oBoCMSfbw&signature 254.212.19.196
3 2008-03-11 17:28:23 63.22.65.73 102.15.230.123 GETVIDEO pSo-_TavE1U&signature 254.212.25.206
4 2008-03-11 17:28:23 63.22.65.77 102.15.134.225 GETVIDEO kHtaORb0LUk&signature 254.212.22.122
5 2008-03-11 17:28:23 63.22.65.77 102.15.111.222 GETVIDEO t7qjlPPmeJE&origin 105.136.78.115
6 2008-03-11 17:28:27 63.22.65.73 35.139.31.8 GETVIDEO 2UPaRi0WY7c&origin 105.136.78.115
7 2008-03-11 17:28:28 63.22.65.73 102.15.143.68 GETVIDEO lAzrUxpybs0&signature 254.212.21.130
8 2008-03-11 17:28:30 63.22.65.73 205.181.139.118 GETVIDEO J_KKyw8V-l0&origin 105.136.78.115
9 2008-03-11 17:28:31 63.22.65.73 102.15.143.20 GETVIDEO xnsPfRdSU0Q&origin 105.136.78.115
10 2008-03-11 17:28:34 63.22.65.94 102.15.141.151 GETVIDEO qDKx6CkQM04&origin 105.136.78.115
df2的样本数据:
V_ID count
0 2UPaRi0WY7c&origin 768
1 t7qjlPPmeJE&origin 142
2 CKrTlXN9-iE&origin 107
3 IZtPejST9IQ&origin 103
4 FKb3qRljGBc&origin 93
5 LcM0OT6mnqA&origin 67
6 7sei-eEjy4g&origin 62
7 qDKx6CkQM04&origin 53
8 4rb8aOzy9t4&origin 46
9 wjv4Fp7GiGk&origin 46
10 SKDXBvPIepI&sign 44
预期输出:
time IP1 IP2 GETVIDEO V_ID IP3
0 2008-03-11 17:28:17 63.22.65.77 205.181.173.92 GETVIDEO 42 254.212.25.169
1 2008-03-11 17:28:20 63.22.65.94 35.139.184.95 GETVIDEO 13 254.212.19.255
2 2008-03-11 17:28:22 63.22.65.73 35.139.176.183 GETVIDEO 21 254.212.19.196
3 2008-03-11 17:28:23 63.22.65.73 102.15.230.123 GETVIDEO 14 254.212.25.206
4 2008-03-11 17:28:23 63.22.65.77 102.15.134.225 GETVIDEO 23 254.212.22.122
5 2008-03-11 17:28:23 63.22.65.77 102.15.111.222 GETVIDEO 1 105.136.78.115
6 2008-03-11 17:28:27 63.22.65.73 35.139.31.8 GETVIDEO 0 105.136.78.115
7 2008-03-11 17:28:28 63.22.65.73 102.15.143.68 GETVIDEO 33 254.212.21.130
8 2008-03-11 17:28:30 63.22.65.73 205.181.139.118 GETVIDEO 42 105.136.78.115
9 2008-03-11 17:28:31 63.22.65.73 102.15.143.20 GETVIDEO 19 105.136.78.115
10 2008-03-11 17:28:34 63.22.65.94 102.15.141.151 GETVIDEO 7 105.136.78.115
答案 0 :(得分:1)
import pandas as pd
df2 = pd.DataFrame({'V_ID': ['a','b','c','d'], 'count':[12,5,7,9]})
df = pd.DataFrame({'time':['2008-03-11', '2008-03-11', '2008-03-11','2008-03-11', '2008-03-11', '2008-03-11', '2008-03-11'],
'V_ID': ['a', 'sdf', 'c','rge', 'gfg', 'a', 'a']})
# Create an index column for df2
df2 = df2.reset_index()
# Key-value pairs of index and V_ID
mapping = df2['V_ID'].to_dict()
# Invert key-value pairs
mapping = {v: k for k, v in mapping.items()}
# Replace values in df['V_ID'] that matches with keys in mapping with values
df['V_ID'] = df['V_ID'].replace(mapping)
print(df)
time V_ID
0 2008-03-11 0
1 2008-03-11 sdf
2 2008-03-11 2
3 2008-03-11 rge
4 2008-03-11 gfg
5 2008-03-11 0
6 2008-03-11 0