想确定每月开始和结束的公司数量。我们的目标是按专栏列出开始和结束的公司数量。
我的数据看起来像这样,有更多的行和列。
mvn test
一个挑战是,一家公司之间可能存在NaN。例如,尽管行之间存在NaN,但该行的第2行从1990_01开始到1990_05结束。
我尝试了以下代码
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.cantgetthistowork.InMemWindowProcessorTest
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.076 sec <<< FAILURE!
testThings(com.cantgetthistowork.InMemWindowProcessorTest) Time elapsed: 0.05 sec <<< ERROR!
java.lang.ClassCastException: org.apache.kafka.streams.processor.MockProcessorContext cannot be cast to org.apache.kafka.streams.processor.internals.InternalProcessorContext
at org.apache.kafka.streams.state.internals.InMemoryWindowStore.init(InMemoryWindowStore.java:91)
at org.apache.kafka.streams.state.internals.WrappedStateStore.init(WrappedStateStore.java:48)
at org.apache.kafka.streams.state.internals.MeteredWindowStore.init(MeteredWindowStore.java:90)
at com.cantgetthistowork.InMemWindowProcessorTest.setup(InMemWindowProcessorTest.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Results :
Tests in error:
testThings(com.cantgetthistowork.InMemWindowProcessorTest): org.apache.kafka.streams.processor.MockProcessorContext cannot be cast to org.apache.kafka.streams.processor.internals.InternalProcessorContext
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
但是会收到以下错误消息
TopologyTestDriver
感谢您的帮助。
答案 0 :(得分:2)
你可以
apply(df[,-1], 1, function(x) range(which(!is.nan(x))))
# [,1] [,2] [,3]
# [1,] 3 1 1
# [2,] 5 5 4
如果要在行和列中添加名称,则可以将其扩展为:
apply(df[,-1], 1, function(x) range(which(!is.nan(x)))) %>%
t %>%
`colnames<-`(c('First','Last')) %>%
`row.names<-`(df[,1])
# First Last
# fg23 3 5
# sdf1 1 5
# sdf1 1 4
答案 1 :(得分:1)
使用tidyverse
用列名表示此内容的另一种方法。我们将数据gather
转换为长格式,并为每行仅选择第一个和最后一个值。创建一个新列(temp
),该列包含每个组的"Start"
和"End"
,并spread
设置为宽格式。
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
gather(key, value, -Firm, -row, na.rm = TRUE) %>%
group_by(row) %>%
slice(c(1L, n())) %>%
mutate(temp = c("Start", "End")) %>%
select(-value) %>%
spread(temp, key) %>%
ungroup %>%
select(-row) %>%
select(Firm, Start, End)
# Firm Start End
# <fct> <chr> <chr>
#1 fg23 Return_1990_03 Return_1990_05
#2 sdf1 Return_1990_01 Return_1990_05
#3 sdf1 Return_1990_01 Return_1990_04
答案 2 :(得分:0)
使用tidyverse
,我们可以做到这一点而无需使用pmap
进行任何重塑。使用names
查找不是NaN的元素中的which
,获得first
和last
列名
library(tidyverse)
df %>%
transmute(Firm, start_end = pmap(.[-1], ~
which(!is.nan(c(...))) %>%
names %>%
range %>%
{tibble(start = first(.), end = last(.))})) %>%
unnest
# Firm start end
#1 fg23 Return_1990_03 Return_1990_05
#2 sdf1 Return_1990_01 Return_1990_05
#3 sdf1 Return_1990_01 Return_1990_04
在base R
中,我们也可以使用max.col
进行矢量化处理
m1 <- !is.na(df[-1])
start <- colnames(m1)[max.col(m1, "first")]
end <- colnames(m1)[max.col(m1, "last")]
cbind(df1['Firm'], start, end)
# Firm start end
#1 fg23 Return_1990_03 Return_1990_05
#2 sdf1 Return_1990_01 Return_1990_05
#3 sdf1 Return_1990_01 Return_1990_04