我有一个数据框,如下所示:
DF_A <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A")
)
我想为Group_1 ID分配连续号码,对于相同Group_2 ID的情况,该号码应唯一。例如,A + A以1开头,A + B以2(相同的Group_1 ID,但是新的Group_2 ID),......,A + A再次为1(显然是重复)。 B + A为1(新的Group_1 ID),...,B + A(相同的Group_1 ID,但是新的Group_2 ID)......依此类推。
结果应如下所示。
DF_B <- data.frame(
Group_1 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "C"),
Group_2 = c("A", "B", "C", "A", "B", "A", "B", "A", "C", "A"),
ID = c(1, 2, 3, 1, 2, 1, 2, 1, 1, 1)
)
我调查了有关相应方法的各种帖子,例如单groups within groups或combination - 没有任何成功 - 此案例未包含在之前的帖子中。
提前谢谢。
答案 0 :(得分:2)
使用ave
执行此操作的一种方法是
DF_A$ID <- ave(DF_A$Group_2, DF_A$Group_1, FUN = function(x) match(x, unique(x)))
DF_A
# Group_1 Group_2 ID
#1 A A 1
#2 A B 2
#3 A C 3
#4 A A 1
#5 A B 2
#6 B A 1
#7 B B 2
#8 B A 1
#9 B C 3
#10 C A 1
等效的dplyr
方式是:
library(dplyr)
DF_A %>%
group_by(Group_1) %>%
mutate(ID = match(Group_2, unique(Group_2)))
答案 1 :(得分:1)
您可以按Group_1拆分成组,然后在每个组中的组合中创建因子,然后转换为整数
import random
import matplotlib.pyplot as plt
NUM_FAMILIES = 10
# set the random seed (for reproducibility)
random.seed(42)
# setup the plot
fig, ax = plt.subplots()
# generate some random data
x = [random.randint(0, 5) for x in range(NUM_FAMILIES)]
# create the histogram
ax.hist(x, align='left') # `align='left'` is used to center the labels
# now, define the ticks (i.e. locations where the labels will be plotted)
xticks = [i for i in range(NUM_FAMILIES)]
# also define the labels we'll use (note this MUST have the same size as `xticks`!)
xtick_labels = ['Family-%d' % (f+1) for f in range(NUM_FAMILIES)]
# add the ticks and labels to the plot
ax.set_xticks(xticks)
ax.set_xticklabels(xtick_labels)
plt.show()
答案 2 :(得分:1)
您可以使用因子级别的整数值。我们可以简单地将# in your ratings.py
class Ratings(Resource):
def __init__(self, *args, **kwargs):
self.settings = self.kwargs.get('settings')
# don't forget to call the super class
super(Ratings, self).__init__(*args, **kwargs)
def get(self, version, metric_name, app_store_name):
# get the settings here
db_interface = DBInterface(settings_file_path=self.settings)
# app.py
# Initialize like this
api.add_resource(ratings.Ratings, '/api/my/end/point/',
resource_class_kwargs={'settings': "path_to_settings.json"})
包裹在Group_2
中以删除因子属性。
c()
答案 3 :(得分:0)
我们可以使用dplyr中的dense_rank
。
library(dplyr)
DF_A2 <- DF_A %>%
group_by(Group_1) %>%
mutate(ID = dense_rank(Group_2)) %>%
ungroup()
DF_A2
# # A tibble: 10 x 3
# Group_1 Group_2 ID
# <fct> <fct> <int>
# 1 A A 1
# 2 A B 2
# 3 A C 3
# 4 A A 1
# 5 A B 2
# 6 B A 1
# 7 B B 2
# 8 B A 1
# 9 B C 3
# 10 C A 1