表格中有以下数据;
Major.sectors EBIT.CAP
1 Food, beverages, tobacco -0.29998599
2 Machinery, equipment, furniture, recycling -10.11204781
3 Construction -0.05840266
4 Publishing, printing 1.56335275
5 Other services -1.87696308
6 Hotels & restaurants -0.93189920
我正在尝试为多个扇区分配ID。例如,在新列中分配ID,例如; Group 1
分配给Food, beverages, tabacco
和Publishing, printing
以及Other services
的{{1}},然后将{1}}和Group 2
等分组ID Construction
分配给Machinery, equipment, furniture, recycling
Dput示例,前40行:
structure(list(Major.sectors = c("Food, beverages, tobacco",
"Machinery, equipment, furniture, recycling", "Construction",
"Publishing, printing", "Other services", "Hotels & restaurants",
"Wholesale & retail trade", "Wholesale & retail trade", "Wholesale & retail trade",
"Construction", "Construction", "Construction", "Construction",
"Transport", "Construction", "Other services", "Hotels & restaurants",
"Transport", "Construction", "Other services", "Construction",
"Wholesale & retail trade", "Construction", "Transport", "Machinery, equipment, furniture, recycling",
"Wholesale & retail trade", "Machinery, equipment, furniture, recycling",
"Wood, cork, paper", "Construction", "Other services", "Other services",
"Chemicals, rubber, plastics, non-metallic products", "Food, beverages, tobacco",
"Construction", "Wholesale & retail trade", "Transport", "Education, Health",
"Chemicals, rubber, plastics, non-metallic products", "Construction",
"Construction"), EBIT.CAP = c(-0.299985988510579, -10.112047808544,
-0.0584026622296173, 1.56335274542429, -1.87696308048616, -0.931899204244032,
0.118490171376958, -0.620735294117647, 0.825160979018652, -0.0531417228115857,
5.04349258649094, 8.46722129783694, -1.56569551556698, 0.288562019546801,
-8.17965612867443, -67.3093602404465, -0.590864600326264, -10.2089108910891,
-2.84859771783905, 27.06476, -3.23294509151414, -0.262818510268391,
-3.83117723156533, 12.2774086378738, -0.0961711136674632, 0.0444317163701523,
-1.72438062594632, -0.0790666666666667, -0.166531914893617, -203.16001330672,
NA, 11.203993344426, -0.0368548170677163, -34.8521655213724,
-354.333333333333, -0.682595842695865, 1.59589572933999, -1.7513907638213,
12.7705882352941, 2.36404166666667)), .Names = c("Major.sectors",
"EBIT.CAP"), row.names = c(NA, 40L), class = "data.frame")
编辑:
我希望最终结果如何。
Major.sectors EBIT.CAP Group
1 Food, beverages, tobacco -0.29998599 Group 1
2 Machinery, equipment, furniture, recycling -10.11204781 Group 2
3 Construction -0.05840266 Group 2
4 Publishing, printing 1.56335275 Group 1
5 Other services -1.87696308 Group 1
6 Hotels & restaurants -0.93189920 Group 3
答案 0 :(得分:1)
我们可以使用raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
中的.GRP
来创建分组ID
data.table
或使用library(data.table)
setDT(df1)[, GroupID := paste0("Group", .GRP), Major.sectors]
tidyverse
如果目标是将多个“Major.sectors”(自定义)聚类到一个组,那么我们创建一个键/值数据集,然后执行library(dplyr)
df1 %>%
mutate(GroupID = paste0("Group", group_indices(., Major.sectors)))
left_join
第6个为空,因为'keyval'数据集不完整