Question

我有这样的数据集：

user_id     location
2222           23
2222           23
2222           24
2222           23
3333           24
3333           24
3333           24

我想在“location”中找到每个不同“user_id”的值的频率。但是我不希望整个专栏都有频率。只想显示每个项目的位置。

结果应该是这样的新列：

user_id     location    frequency
2222           23          1
2222           23          2
2222           24          1
2222           23          1
3333           24          1
3333           24          2
3333           24          3

Answer 1

如果你有像这样的元组数组

tupleArray = [(2222,23),(2222,23)...]

您可以构建另一个frecuency数组并检查之前的值是否相同以增加值

tupleArray = [(2222,23),(2222,23),(2222,24),(2222,23),(3333,24),(3333,24),(3333,24)]

frecuencyArray = []

tempValue = 1

for x in range(len(tupleArray)):
  if (x-1>=0 and tupleArray[x][1]==tupleArray[x-1][1]):
     tempValue = tempValue +1
  else:
     tempValue=1

  frecuencyArray.append(tempValue)


print(frecuencyArray)

测试代码：

https://repl.it/repls/WavyLostDatamining

Answer 2

不确定我是否完全理解你的问题，但会在之前给出。

dataset = [
  (2222, 23),
  (2222, 23),
  (2222, 24),
  (2222, 23),
  (3333, 24),
  (3333, 24),
  (3333, 24),
]

feq = {}

ret = []
for usr_id, loc_id in dataset:
  key = '{}.{}'.format(usr_id, loc_id)
  feq.setdefault(key, 0)
  feq[key] += 1
  ret.append((usr_id, loc_id, feq[key]))

ret的内容是：

[(2222, 23, 1),
 (2222, 23, 2),
 (2222, 24, 1),
 (2222, 23, 3),
 (3333, 24, 1),
 (3333, 24, 2),
 (3333, 24, 3)]

Answer 3

请粘贴一个实际的python数据对象定义，而不是打印输出

d = """2222           23
2222           23
2222           24
2222           23
3333           24
3333           24
3333           24"""

d = [*map(lambda x: [*map(int, x.split())], d.split('\n'))]

d

Out[89]: 
[[2222, 23],
 [2222, 23],
 [2222, 24],
 [2222, 23],
 [3333, 24],
 [3333, 24],
 [3333, 24]]

然后

df, c = [], 0
for a, b in zip(d, [d[0]] + d):
    c = c*(a == b) + 1  # count with reset if lagged b value != a
    df.append(a + [c])
df
Out[90]: 
[[2222, 23, 1],
 [2222, 23, 2],
 [2222, 24, 1],
 [2222, 23, 1],
 [3333, 24, 1],
 [3333, 24, 2],
 [3333, 24, 3]]

Answer 4

标准库为这种情况提供了Counter类。

from collections import Counter
frequencies = Counter((user_id, location) for user_id, location in data)

这将提供从每个user_id / location对到它们一起出现的次数的地图。

要将此数据传输回矩阵格式，请尝试以下代码段：

new_table = []
for (user_id, location), frequency in frequencies.items():
    new_table.append([user_id, location, frequency])

这是一个demo，因此您可以看到它的实际效果。

Python：找到重复的值

4 个答案: