我在R中有一个data.frame,我在下面附上了。第一列包含row.names,我认为这个例子可以忽略。
以下是我想做的事情:
对于value
列的每次连续运行,我想生成start
和end
的最长组合。
以下data.frame
的解决方案如下所示:
start end value
1 11498007 11675212 NIC2
2 11675212 11675695 ED3048
3 11675695 12007383 NIC2
我可以使用for
循环在R中使用它,但这是禁止的,因为我正在使用更大的数据集。
有没有办法用dplyr
或其他快速方法轻松完成?
start end value
1 11498007 11500185 NIC2
2 11500185 11503809 NIC2
3 11503809 11504028 NIC2
4 11504028 11505268 NIC2
5 11505268 11506382 NIC2
6 11506382 11506414 NIC2
7 11506414 11506422 NIC2
8 11506422 11506659 NIC2
9 11506659 11506790 NIC2
10 11506790 11506921 NIC2
11 11506921 11507408 NIC2
12 11507408 11507482 NIC2
13 11507482 11508111 NIC2
14 11508111 11510776 NIC2
15 11510776 11514107 NIC2
16 11514107 11514141 NIC2
17 11514141 11514941 NIC2
18 11514941 11515753 NIC2
19 11515753 11516308 NIC2
20 11516308 11520681 NIC2
21 11520681 11522554 NIC2
22 11522554 11523088 NIC2
23 11523088 11525130 NIC2
24 11525130 11527377 NIC2
25 11527377 11527525 NIC2
26 11527525 11527939 NIC2
27 11527939 11528408 NIC2
28 11528408 11528420 NIC2
29 11528420 11528444 NIC2
30 11528444 11528453 NIC2
31 11528453 11528611 NIC2
32 11528611 11529008 NIC2
33 11529008 11529017 NIC2
34 11529017 11529257 NIC2
35 11529257 11530157 NIC2
36 11530157 11530186 NIC2
37 11530186 11530421 NIC2
38 11530421 11530518 NIC2
39 11530518 11530624 NIC2
40 11530624 11530666 NIC2
41 11530666 11530994 NIC2
42 11530994 11532649 NIC2
43 11532649 11532738 NIC2
44 11532738 11533042 NIC2
45 11533042 11533454 NIC2
46 11533454 11533912 NIC2
47 11533912 11534304 NIC2
48 11534304 11537299 NIC2
49 11537299 11539754 NIC2
50 11539754 11541846 NIC2
51 11541846 11543431 NIC2
52 11543431 11557925 NIC2
53 11557925 11558476 NIC2
54 11558476 11559622 NIC2
55 11559622 11562905 NIC2
56 11562905 11569135 NIC2
57 11569135 11569433 NIC2
58 11569433 11570277 NIC2
59 11570277 11570284 NIC2
60 11570284 11574102 NIC2
61 11574102 11577288 NIC2
62 11577288 11579868 NIC2
63 11579868 11584487 NIC2
64 11584487 11585017 NIC2
65 11585017 11585996 NIC2
66 11585996 11586122 NIC2
67 11586122 11587155 NIC2
68 11587155 11588850 NIC2
69 11588850 11601008 NIC2
70 11601008 11605243 NIC2
71 11605243 11606089 NIC2
72 11606089 11609905 NIC2
73 11609905 11611376 NIC2
74 11611376 11621733 NIC2
75 11621733 11623480 NIC2
76 11623480 11625922 NIC2
77 11625922 11634546 NIC2
78 11634546 11634930 NIC2
79 11634930 11639416 NIC2
80 11639416 11640314 NIC2
81 11640314 11641999 NIC2
82 11641999 11643118 NIC2
83 11643118 11650865 NIC2
84 11650865 11658435 NIC2
85 11658435 11660037 NIC2
86 11660037 11660064 NIC2
87 11660064 11660490 NIC2
88 11660490 11660544 NIC2
89 11660544 11666281 NIC2
90 11666281 11667555 NIC2
91 11667555 11675212 NIC2
92 11675212 11675638 ED3048
93 11675638 11675695 ED3048
94 11675695 11677084 NIC2
95 11677084 11677388 NIC2
96 11677388 11683114 NIC2
97 11683114 11685474 NIC2
98 11685474 11689877 NIC2
99 11689877 11694696 NIC2
100 11694696 11702279 NIC2
101 11702279 11703345 NIC2
102 11703345 11703916 NIC2
103 11703916 11704719 NIC2
104 11704719 11705706 NIC2
105 11705706 11714124 NIC2
106 11714124 11714678 NIC2
107 11714678 11715411 NIC2
108 11715411 11716478 NIC2
109 11716478 11717317 NIC2
110 11717317 11720168 NIC2
111 11720168 11734503 NIC2
112 11734503 11744967 NIC2
113 11744967 11759069 NIC2
114 11759069 11759607 NIC2
115 11759607 11766365 NIC2
116 11766365 11769861 NIC2
117 11769861 11769896 NIC2
118 11769896 11769916 NIC2
119 11769916 11769931 NIC2
120 11769931 11769932 NIC2
121 11769932 11769935 NIC2
122 11769935 11769994 NIC2
123 11769994 11770048 NIC2
124 11770048 11770088 NIC2
125 11770088 11770090 NIC2
126 11770090 11771234 NIC2
127 11771234 11772929 NIC2
128 11772929 11781474 NIC2
129 11781474 11781973 NIC2
130 11781973 11783884 NIC2
131 11783884 11784493 NIC2
132 11784493 11784498 NIC2
133 11784498 11784732 NIC2
134 11784732 11785308 NIC2
135 11785308 11785860 NIC2
136 11785860 11789778 NIC2
137 11789778 11792506 NIC2
138 11792506 11794567 NIC2
139 11794567 11801832 NIC2
140 11801832 11802161 NIC2
141 11802161 11802507 NIC2
142 11802507 11802508 NIC2
143 11802508 11803263 NIC2
144 11803263 11803364 NIC2
145 11803364 11803373 NIC2
146 11803373 11803568 NIC2
147 11803568 11803980 NIC2
148 11803980 11804107 NIC2
149 11804107 11804369 NIC2
150 11804369 11805042 NIC2
151 11805042 11805711 NIC2
152 11805711 11805863 NIC2
153 11805863 11806743 NIC2
154 11806743 11806942 NIC2
155 11806942 11808615 NIC2
156 11808615 11808839 NIC2
157 11808839 11809970 NIC2
158 11809970 11810603 NIC2
159 11810603 11811912 NIC2
160 11811912 11813086 NIC2
161 11813086 11820680 NIC2
162 11820680 11820771 NIC2
163 11820771 11820818 NIC2
164 11820818 11820984 NIC2
165 11820984 11821011 NIC2
166 11821011 11821360 NIC2
167 11821360 11821380 NIC2
168 11821380 11821597 NIC2
169 11821597 11823045 NIC2
170 11823045 11824456 NIC2
171 11824456 11824484 NIC2
172 11824484 11824622 NIC2
173 11824622 11825060 NIC2
174 11825060 11825674 NIC2
175 11825674 11825769 NIC2
176 11825769 11826152 NIC2
177 11826152 11826183 NIC2
178 11826183 11826192 NIC2
179 11826192 11826220 NIC2
180 11826220 11826222 NIC2
181 11826222 11826229 NIC2
182 11826229 11826236 NIC2
183 11826236 11826259 NIC2
184 11826259 11826262 NIC2
185 11826262 11826275 NIC2
186 11826275 11826284 NIC2
187 11826284 11826311 NIC2
188 11826311 11826354 NIC2
189 11826354 11826363 NIC2
190 11826363 11826366 NIC2
191 11826366 11826450 NIC2
192 11826450 11826495 NIC2
193 11826495 11826522 NIC2
194 11826522 11827132 NIC2
195 11827132 11827151 NIC2
196 11827151 11827178 NIC2
197 11827178 11827257 NIC2
198 11827257 11827281 NIC2
199 11827281 11827309 NIC2
200 11827309 11827341 NIC2
201 11827341 11827418 NIC2
202 11827418 11827450 NIC2
203 11827450 11827751 NIC2
204 11827751 11828070 NIC2
205 11828070 11828970 NIC2
206 11828970 11832662 NIC2
207 11832662 11833369 NIC2
208 11833369 11833706 NIC2
209 11833706 11833787 NIC2
210 11833787 11834531 NIC2
211 11834531 11835129 NIC2
212 11835129 11835167 NIC2
213 11835167 11836265 NIC2
214 11836265 11836393 NIC2
215 11836393 11838190 NIC2
216 11838190 11839047 NIC2
217 11839047 11840050 NIC2
218 11840050 11842764 NIC2
219 11842764 11845235 NIC2
220 11845235 11849208 NIC2
221 11849208 11855696 NIC2
222 11855696 11856301 NIC2
223 11856301 11860647 NIC2
224 11860647 11861397 NIC2
225 11861397 11875177 NIC2
226 11875177 11880848 NIC2
227 11880848 11881762 NIC2
228 11881762 11882261 NIC2
229 11882261 11887769 NIC2
230 11887769 11895586 NIC2
231 11895586 11898469 NIC2
232 11898469 11898719 NIC2
233 11898719 11900746 NIC2
234 11900746 11901060 NIC2
235 11901060 11901664 NIC2
236 11901664 11905614 NIC2
237 11905614 11905670 NIC2
238 11905670 11906209 NIC2
239 11906209 11910442 NIC2
240 11910442 11910450 NIC2
241 11910450 11912061 NIC2
242 11912061 11912249 NIC2
243 11912249 11913903 NIC2
244 11913903 11917884 NIC2
245 11917884 11919309 NIC2
246 11919309 11922775 NIC2
247 11922775 11923192 NIC2
248 11923192 11923408 NIC2
249 11923408 11924092 NIC2
250 11924092 11925352 NIC2
251 11925352 11925626 NIC2
252 11925626 11926682 NIC2
253 11926682 11928066 NIC2
254 11928066 11928440 NIC2
255 11928440 11928450 NIC2
256 11928450 11928495 NIC2
257 11928495 11928500 NIC2
258 11928500 11928528 NIC2
259 11928528 11928883 NIC2
260 11928883 11930073 NIC2
261 11930073 11931553 NIC2
262 11931553 11933250 NIC2
263 11933250 11936043 NIC2
264 11936043 11937320 NIC2
265 11937320 11937813 NIC2
266 11937813 11942138 NIC2
267 11942138 11945949 NIC2
268 11945949 11947373 NIC2
269 11947373 11949849 NIC2
270 11949849 11951251 NIC2
271 11951251 11952909 NIC2
272 11952909 11956032 NIC2
273 11956032 11956098 NIC2
274 11956098 11956192 NIC2
275 11956192 11956361 NIC2
276 11956361 11956809 NIC2
277 11956809 11957113 NIC2
278 11957113 11957238 NIC2
279 11957238 11958013 NIC2
280 11958013 11964579 NIC2
281 11964579 11964696 NIC2
282 11964696 11964715 NIC2
283 11964715 11972147 NIC2
284 11972147 11974077 NIC2
285 11974077 11974946 NIC2
286 11974946 11975462 NIC2
287 11975462 11975463 NIC2
288 11975463 11975981 NIC2
289 11975981 11977701 NIC2
290 11977701 11978314 NIC2
291 11978314 11978494 NIC2
292 11978494 11978866 NIC2
293 11978866 11980251 NIC2
294 11980251 11981137 NIC2
295 11981137 11981470 NIC2
296 11981470 11981767 NIC2
297 11981767 11981769 NIC2
298 11981769 11981786 NIC2
299 11981786 11981867 NIC2
300 11981867 11983276 NIC2
301 11983276 11983333 NIC2
302 11983333 11983494 NIC2
303 11983494 11983699 NIC2
304 11983699 11983876 NIC2
305 11983876 11983926 NIC2
306 11983926 11983968 NIC2
307 11983968 11984130 NIC2
308 11984130 11984180 NIC2
309 11984180 11984185 NIC2
310 11984185 11984277 NIC2
311 11984277 11984457 NIC2
312 11984457 11984855 NIC2
313 11984855 11986267 NIC2
314 11986267 11986269 NIC2
315 11986269 11986535 NIC2
316 11986535 11987332 NIC2
317 11987332 11989515 NIC2
318 11989515 11989615 NIC2
319 11989615 11991259 NIC2
320 11991259 11991905 NIC2
321 11991905 11991922 NIC2
322 11991922 11992083 NIC2
323 11992083 11992132 NIC2
324 11992132 11992133 NIC2
325 11992133 11992665 NIC2
326 11992665 11993396 NIC2
327 11993396 11993616 NIC2
328 11993616 11994093 NIC2
329 11994093 11994280 NIC2
330 11994280 11994287 NIC2
331 11994287 11995665 NIC2
332 11995665 11995678 NIC2
333 11995678 11995684 NIC2
334 11995684 11995716 NIC2
335 11995716 11995775 NIC2
336 11995775 11995802 NIC2
337 11995802 11995982 NIC2
338 11995982 11995997 NIC2
339 11995997 11996008 NIC2
340 11996008 11996011 NIC2
341 11996011 11996014 NIC2
342 11996014 11996018 NIC2
343 11996018 11996028 NIC2
344 11996028 11996035 NIC2
345 11996035 11996142 NIC2
346 11996142 11996284 NIC2
347 11996284 11996418 NIC2
348 11996418 11996452 NIC2
349 11996452 11998022 NIC2
350 11998022 12002709 NIC2
351 12002709 12003081 NIC2
352 12003081 12006843 NIC2
353 12006843 12007383 NIC2
答案 0 :(得分:2)
data.table
的{{1}}函数对于此类任务非常方便。您可以像这样使用它:
rleid
或者您可以在dplyr链中使用它:
library(data.table)
dt <- as.data.table(df)
dt[, .(start = min(start), end = max(end)), by = .(value, rleid(value))][
, !"rleid", with=FALSE]
# value start end
#1: NIC2 11498007 11675212
#2: ED3048 11675212 11675695
#3: NIC2 11675695 12007383
答案 1 :(得分:0)
借用here看到的rle
个技巧,你可以这样做:
library(dplyr)
df$value <- as.character(df$value)
df %>%
group_by(cont_run_val = paste(value,
{tmp = rle(value); rep(seq_along(tmp$lengths), tmp$lengths)},
sep = "_")) %>%
summarize(min_start = min(start),
max_end = max(end))
# cont_run_val min_start max_end
# (chr) (int) (int)
# 1 ED3048_2 11675212 11675695
# 2 NIC2_1 11498007 11675212
# 3 NIC2_3 11675695 12007383