我无法对列进行分组" short_desc"在我的数据框中并应用创建与该值对应的十分位数范围的代码。十进制范围在列#34; Value.fc"。
中生成我用来创建十分位数范围的代码是:
df <- df %>%
mutate(Value.fc = cut2(Value, g=10),
Value.fc = factor(sapply(str_extract_all(Value.fc, "\\d+"),
function(x) paste(x, collapse="-"))),
Value.fc = reorder(Value.fc, Value))
当&#34; short_desc&#34;只有一个级别时,代码可以工作。但是,当我使用group_by()函数应用该代码时,十分位数范围是错误的。 &#34; Value.fc&#34;的样本数据看起来像我使用group_by():
dput(head(df)) structure(list(
state = c("Iowa", "Iowa", "Illinois"),
short_desc = c("Corn, grain - yield, measured in bu / acre", "Corn, silage - yield, measured in tons / acre", "Corn, grain - yield, measured in bu / acre"),
Value = c(137.8, 13.5, 153.3),
FIPS = c("19001", "19001", "17001"),
Value.fc = c("135-0-150", "13-0-14-5", "150-4-157"))
&#34; Value.fc&#34;的第一个值应该看起来像&#34; 135-150&#34;不是&#34; 135-0-150&#34;。任何帮助将不胜感激。
答案 0 :(得分:0)
不是一个答案,但不能在评论中发布更长的代码。适用于data.table
:
<强> CODE 强>
library(data.table)
library(Hmisc)
df <- data.table(state = state.name[1:50],
value = unname(state.x77[, 1]))
df[, value.fc.inter := cut2(value, g = 10)]
df[, value.fc := unlist(lapply(stringr::str_extract_all(value.fc.inter, "\\d+"),
function(z) paste(z, collapse = "-")))]
我有value.fc.inter
作为中间变量来说明这个过程。
<强>输出强>
> head(df)
state value value.fc.inter value.fc
1: Alabama 3615 [ 2861, 3806) 2861-3806
2: Alaska 365 [ 365, 637) 365-637
3: Arizona 2212 [ 1544, 2284) 1544-2284
4: Arkansas 2110 [ 1544, 2284) 1544-2284
5: California 21198 [11197,21198] 11197-21198
6: Colorado 2541 [ 2284, 2861) 2284-2861
如果您想按value.fc
订购,可以使用以下代码,但我认为levels
value.fc
需要订购才能按预期工作。
> df[order(value.fc, decreasing = F)]
state value value.fc.inter value.fc
1: California 21198 [11197,21198] 11197-21198
2: Illinois 11197 [11197,21198] 11197-21198
3: New York 18076 [11197,21198] 11197-21198
4: Pennsylvania 11860 [11197,21198] 11197-21198
5: Texas 12237 [11197,21198] 11197-21198
6: Arizona 2212 [ 1544, 2284) 1544-2284
7: Arkansas 2110 [ 1544, 2284) 1544-2284
8: Kansas 2280 [ 1544, 2284) 1544-2284
9: Nebraska 1544 [ 1544, 2284) 1544-2284
10: West Virginia 1799 [ 1544, 2284) 1544-2284
11: Colorado 2541 [ 2284, 2861) 2284-2861
12: Mississippi 2341 [ 2284, 2861) 2284-2861
13: Oklahoma 2715 [ 2284, 2861) 2284-2861
14: Oregon 2284 [ 2284, 2861) 2284-2861
15: South Carolina 2816 [ 2284, 2861) 2284-2861
16: Alabama 3615 [ 2861, 3806) 2861-3806
17: Connecticut 3100 [ 2861, 3806) 2861-3806
18: Iowa 2861 [ 2861, 3806) 2861-3806
19: Kentucky 3387 [ 2861, 3806) 2861-3806
20: Washington 3559 [ 2861, 3806) 2861-3806
21: Alaska 365 [ 365, 637) 365-637
22: Delaware 579 [ 365, 637) 365-637
23: Nevada 590 [ 365, 637) 365-637
24: Vermont 472 [ 365, 637) 365-637
25: Wyoming 376 [ 365, 637) 365-637
26: Louisiana 3806 [ 3806, 4767) 3806-4767
27: Maryland 4122 [ 3806, 4767) 3806-4767
28: Minnesota 3921 [ 3806, 4767) 3806-4767
29: Tennessee 4173 [ 3806, 4767) 3806-4767
30: Wisconsin 4589 [ 3806, 4767) 3806-4767
31: Georgia 4931 [ 4767, 5814) 4767-5814
32: Indiana 5313 [ 4767, 5814) 4767-5814
33: Missouri 4767 [ 4767, 5814) 4767-5814
34: North Carolina 5441 [ 4767, 5814) 4767-5814
35: Virginia 4981 [ 4767, 5814) 4767-5814
36: Florida 8277 [ 5814,11197) 5814-11197
37: Massachusetts 5814 [ 5814,11197) 5814-11197
38: Michigan 9111 [ 5814,11197) 5814-11197
39: New Jersey 7333 [ 5814,11197) 5814-11197
40: Ohio 10735 [ 5814,11197) 5814-11197
41: Idaho 813 [ 637, 868) 637-868
42: Montana 746 [ 637, 868) 637-868
43: New Hampshire 812 [ 637, 868) 637-868
44: North Dakota 637 [ 637, 868) 637-868
45: South Dakota 681 [ 637, 868) 637-868
46: Hawaii 868 [ 868, 1544) 868-1544
47: Maine 1058 [ 868, 1544) 868-1544
48: New Mexico 1144 [ 868, 1544) 868-1544
49: Rhode Island 931 [ 868, 1544) 868-1544
50: Utah 1203 [ 868, 1544) 868-1544
state value value.fc.inter value.fc
如果您只想分组,那么您也可以使用by
参数而不是order
:
> df[, .(state, value) , by = .(value.fc)][1:10]
value.fc state value
1: 2861-3806 Alabama 3615
2: 2861-3806 Connecticut 3100
3: 2861-3806 Iowa 2861
4: 2861-3806 Kentucky 3387
5: 2861-3806 Washington 3559
6: 365-637 Alaska 365
7: 365-637 Delaware 579
8: 365-637 Nevada 590
9: 365-637 Vermont 472
10: 365-637 Wyoming 376