Question

表格中有以下数据;

                               Major.sectors     EBIT.CAP
1                   Food, beverages, tobacco  -0.29998599
2 Machinery, equipment, furniture, recycling -10.11204781
3                               Construction  -0.05840266
4                       Publishing, printing   1.56335275
5                             Other services  -1.87696308
6                       Hotels & restaurants  -0.93189920

我正在尝试为多个扇区分配ID。例如，在新列中分配ID，例如; Group 1分配给Food, beverages, tabacco和Publishing, printing以及Other services的{{1}}，然后将{1}}和Group 2等分组ID Construction分配给Machinery, equipment, furniture, recycling

Dput示例，前40行：

structure(list(Major.sectors = c("Food, beverages, tobacco", 
"Machinery, equipment, furniture, recycling", "Construction", 
"Publishing, printing", "Other services", "Hotels & restaurants", 
"Wholesale & retail trade", "Wholesale & retail trade", "Wholesale & retail trade", 
"Construction", "Construction", "Construction", "Construction", 
"Transport", "Construction", "Other services", "Hotels & restaurants", 
"Transport", "Construction", "Other services", "Construction", 
"Wholesale & retail trade", "Construction", "Transport", "Machinery, equipment, furniture, recycling", 
"Wholesale & retail trade", "Machinery, equipment, furniture, recycling", 
"Wood, cork, paper", "Construction", "Other services", "Other services", 
"Chemicals, rubber, plastics, non-metallic products", "Food, beverages, tobacco", 
"Construction", "Wholesale & retail trade", "Transport", "Education, Health", 
"Chemicals, rubber, plastics, non-metallic products", "Construction", 
"Construction"), EBIT.CAP = c(-0.299985988510579, -10.112047808544, 
-0.0584026622296173, 1.56335274542429, -1.87696308048616, -0.931899204244032, 
0.118490171376958, -0.620735294117647, 0.825160979018652, -0.0531417228115857, 
5.04349258649094, 8.46722129783694, -1.56569551556698, 0.288562019546801, 
-8.17965612867443, -67.3093602404465, -0.590864600326264, -10.2089108910891, 
-2.84859771783905, 27.06476, -3.23294509151414, -0.262818510268391, 
-3.83117723156533, 12.2774086378738, -0.0961711136674632, 0.0444317163701523, 
-1.72438062594632, -0.0790666666666667, -0.166531914893617, -203.16001330672, 
NA, 11.203993344426, -0.0368548170677163, -34.8521655213724, 
-354.333333333333, -0.682595842695865, 1.59589572933999, -1.7513907638213, 
12.7705882352941, 2.36404166666667)), .Names = c("Major.sectors", 
"EBIT.CAP"), row.names = c(NA, 40L), class = "data.frame")

编辑：

我希望最终结果如何。

                               Major.sectors     EBIT.CAP    Group
1                   Food, beverages, tobacco  -0.29998599    Group 1
2 Machinery, equipment, furniture, recycling -10.11204781    Group 2
3                               Construction  -0.05840266    Group 2
4                       Publishing, printing   1.56335275    Group 1
5                             Other services  -1.87696308    Group 1
6                       Hotels & restaurants  -0.93189920    Group 3

Answer 1

我们可以使用raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)中的.GRP来创建分组ID

data.table

或使用library(data.table) setDT(df1)[, GroupID := paste0("Group", .GRP), Major.sectors]

中的类似选项

tidyverse

如果目标是将多个“Major.sectors”（自定义）聚类到一个组，那么我们创建一个键/值数据集，然后执行library(dplyr) df1 %>% mutate(GroupID = paste0("Group", group_indices(., Major.sectors)))

left_join

第6个为空，因为'keyval'数据集不完整

将ID值组分配给R

1 个答案: