我的数据框包含以下值:
URL Response.Code Count
www.site.com/page1 200 4
www.site.com/page1 301 1
www.site.com/page2 200 5
www.site.com/page3 301 4
www.site.com/page4 200 4
www.site.com/page4 403 1
对于URL的每个唯一值,我想知道是否存在多个Response.Code值。如果只存在一个组合URL / Response.Code,则URL是一致的。期望的输出是这样的数据帧:
URL Consistent
www.site.com/page1 FALSE
www.site.com/page2 TRUE
www.site.com/page3 TRUE
www.site.com/page4 FALSE
我可以为每个唯一的URL做一个循环,并检查Response.Code中不同值的数量,但它看起来不像是解决这个问题的R方式。
有关解决此问题的最佳方法的任何建议吗?我是R& S的新手在这里检查了有关重复的多个问题,但似乎没有为这个特定问题找到解决方案。
答案 0 :(得分:3)
您可以使用base R
aggregate
aggregate(Response.Code~URL, df, length)[2] == 1
# Response.Code
#[1,] FALSE
#[2,] TRUE
#[3,] TRUE
#[4,] FALSE
如果您想要所需格式的输出,那么您可以
agg <- aggregate(Response.Code~URL, df, length)
new_df <- data.frame(URL = agg$URL, Consistent = agg$Response.Code == 1)
new_df
# URL Consistent
#1 www.site.com/page1 FALSE
#2 www.site.com/page2 TRUE
#3 www.site.com/page3 TRUE
#4 www.site.com/page4 FALSE
答案 1 :(得分:2)
我们可以使用data.table
。将'data.frame'转换为'data.table'(setDT(df1)
),按'URL'分组,我们检查行数是否等于1.
library(data.table)
setDT(df1)[, .(Consistent = .N ==1), by = URL]
# URL Consistent
#1: www.site.com/page1 FALSE
#2: www.site.com/page2 TRUE
#3: www.site.com/page3 TRUE
#4: www.site.com/page4 FALSE
或者,如果我们检查'Response.Code'中的length
个unique
元素为1,我们可以在使用'URL'进行分组后使用uniqueN
。
setDT(df1)[, .(Consistent = uniqueN(Response.Code)==1), by = URL]
# URL Consistent
#1: www.site.com/page1 FALSE
#2: www.site.com/page2 TRUE
#3: www.site.com/page3 TRUE
#4: www.site.com/page4 FALSE
答案 2 :(得分:1)
我们也可以选择帽子戏法(base,data.table和dplyr)
df1 <- structure(list(URL = c("www.site.com/page1", "www.site.com/page1",
"www.site.com/page2", "www.site.com/page3", "www.site.com/page4",
"www.site.com/page4"), Response.Code = c(200L, 301L, 200L, 301L,
200L, 403L), Count = c(4L, 1L, 5L, 4L, 4L, 1L)), .Names = c("URL",
"Response.Code", "Count"), class = "data.frame", row.names = c(NA,
-6L))
df1 %>%
group_by(URL) %>%
summarise(Consistent = n_distinct(Response.Code) == 1)
答案 3 :(得分:0)
假设您的数据框名为x,那么可以运行的一件事就是
x$consistent <- duplicated(x[,1:2]) | duplicated(x[,1:2], fromLast = TRUE)
将仅检查前两列中的重复项,并将TRUE / FALSE值写入新列。默认情况下,duplicated()
不会为重复行的所有实例返回TRUE
。默认情况下,第一个实例将为FALSE
,第一个实例之后的所有后续行将为TRUE
。通过使用fromLast = TRUE
和不使用TRUE
使x $一致为TRUE,我确保所有实例都以y <- x[!(duplicated(x$URL), c(1,4)]
结束。
如果你想要输出就像你说的那样,你可以运行它来删除重复的URL和额外的列:
duplicated()
这会得到您正在寻找的结果,但如果您对其他内容感兴趣,我建议您阅读文档中的18:13:55.254 [main] INFO o.s.c.a.AnnotationConfigApplicationContext - Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@b3d7190: startup date [Wed Sep 07 18:13:55 CEST 2016]; root of context hierarchy
18:13:55.403 [main] WARN o.s.c.a.AnnotationConfigApplicationContext - Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanDefinitionStoreException: Failed to process import candidates for configuration class [el.dorado.App]; nested exception is java.lang.IllegalArgumentException: No auto configuration classes found in META-INF/spring.factories. If you are using a custom packaging, make sure that file is correct.
18:13:55.414 [main] ERROR o.s.boot.SpringApplication - Application startup failed
org.springframework.beans.factory.BeanDefinitionStoreException: Failed to process import candidates for configuration class [el.dorado.App]; nested exception is java.lang.IllegalArgumentException: No auto configuration classes found in META-INF/spring.factories. If you are using a custom packaging, make sure that file is correct.
at org.springframework.context.annotation.ConfigurationClassParser.processDeferredImportSelectors(ConfigurationClassParser.java:489)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:191)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:321)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:243)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanDefinitionRegistryPostProcessors(PostProcessorRegistrationDelegate.java:273)
at org.springframework.context.support.PostProcessorRegistrationDelegate.invokeBeanFactoryPostProcessors(PostProcessorRegistrationDelegate.java:98)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:681)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:523)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:759)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:369)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:313)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1185)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1174)
at dz.lab.jpmtask.App.main(App.java:33)
Caused by: java.lang.IllegalArgumentException: No auto configuration classes found in META-INF/spring.factories. If you are using a custom packaging, make sure that file is correct.
at org.springframework.util.Assert.notEmpty(Assert.java:276)
at org.springframework.boot.autoconfigure.EnableAutoConfigurationImportSelector.getCandidateConfigurations(EnableAutoConfigurationImportSelector.java:145)
at org.springframework.boot.autoconfigure.EnableAutoConfigurationImportSelector.selectImports(EnableAutoConfigurationImportSelector.java:84)
at org.springframework.context.annotation.ConfigurationClassParser.processDeferredImportSelectors(ConfigurationClassParser.java:481)
... 13 common frames omitted
。