集团内的连续交叉和唯一数量

时间:2018-02-15 02:43:13

标签: r dataframe grouping

我有一个数据框,如下所示:

DF_A <- data.frame(
  Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
  Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A")
)

我想为Group_1 ID分配连续号码,对于相同Group_2 ID的情况,该号码应唯一。例如,A + A以1开头,A + B以2(相同的Group_1 ID,但是新的Group_2 ID),......,A + A再次为1(显然是重复)。 B + A为1(新的Group_1 ID),...,B + A(相同的Group_1 ID,但是新的Group_2 ID)......依此类推。

结果应如下所示。

DF_B <- data.frame(
  Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
  Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A"),
  ID      = c(1, 2, 3, 1, 2, 1, 2, 1, 1, 1)
)

我调查了有关相应方法的各种帖子,例如单groups within groupscombination - 没有任何成功 - 此案例未包含在之前的帖子中。

提前谢谢。

4 个答案:

答案 0 :(得分:2)

使用ave执行此操作的一种方法是

DF_A$ID <- ave(DF_A$Group_2, DF_A$Group_1, FUN = function(x) match(x, unique(x)))

DF_A
#   Group_1 Group_2 ID
#1        A       A  1
#2        A       B  2
#3        A       C  3
#4        A       A  1
#5        A       B  2
#6        B       A  1
#7        B       B  2
#8        B       A  1
#9        B       C  3
#10       C       A  1

等效的dplyr方式是:

library(dplyr)
DF_A %>%
  group_by(Group_1) %>%
  mutate(ID = match(Group_2, unique(Group_2)))

答案 1 :(得分:1)

您可以按Group_1拆分成组,然后在每个组中的组合中创建因子,然后转换为整数

import random
import matplotlib.pyplot as plt


NUM_FAMILIES = 10

# set the random seed (for reproducibility)
random.seed(42)

# setup the plot
fig, ax = plt.subplots()

# generate some random data
x = [random.randint(0, 5) for x in range(NUM_FAMILIES)]

# create the histogram
ax.hist(x, align='left') # `align='left'` is used to center the labels

# now, define the ticks (i.e. locations where the labels will be plotted)
xticks = [i for i in range(NUM_FAMILIES)]

# also define the labels we'll use (note this MUST have the same size as `xticks`!)
xtick_labels = ['Family-%d' % (f+1) for f in range(NUM_FAMILIES)]

# add the ticks and labels to the plot
ax.set_xticks(xticks)
ax.set_xticklabels(xtick_labels)

plt.show()

答案 2 :(得分:1)

您可以使用因子级别的整数值。我们可以简单地将# in your ratings.py class Ratings(Resource): def __init__(self, *args, **kwargs): self.settings = self.kwargs.get('settings') # don't forget to call the super class super(Ratings, self).__init__(*args, **kwargs) def get(self, version, metric_name, app_store_name): # get the settings here db_interface = DBInterface(settings_file_path=self.settings) # app.py # Initialize like this api.add_resource(ratings.Ratings, '/api/my/end/point/', resource_class_kwargs={'settings': "path_to_settings.json"}) 包裹在Group_2中以删除因子属性。

c()

答案 3 :(得分:0)

我们可以使用中的dense_rank

library(dplyr)

DF_A2 <- DF_A %>%
  group_by(Group_1) %>%
  mutate(ID = dense_rank(Group_2)) %>%
  ungroup()
DF_A2
# # A tibble: 10 x 3
#    Group_1 Group_2    ID
#    <fct>   <fct>   <int>
#  1 A       A           1
#  2 A       B           2
#  3 A       C           3
#  4 A       A           1
#  5 A       B           2
#  6 B       A           1
#  7 B       B           2
#  8 B       A           1
#  9 B       C           3
# 10 C       A           1