在R中递归地应用lapply

时间:2016-07-31 05:52:18

标签: r lapply

出于好奇,我正在测试递归lapply是否给出了与手动应用函数相同的结果。我发现lapply行为不规律。所以,这就是我的所作所为:

示例1:

m<-c(2,3,4)
n<-c(5,6,3)
o<-c(1,1,1.5)
dc<-data.frame(m,n,o)

现在,让我们分析一下有趣的部分:

lapply(dc,mode)

给出:

lapply(dc,mode)
$m
[1] "numeric"

$n
[1] "numeric"

$o
[1] "numeric"

让我们单独比较上面的结果运行模式,说“m”。

  mode(dc$m)

我得到了:

"numeric"

与他人同上。这一切都很好,因为我们有原子矢量。

现在,让我们分析另一个例子:

示例2:

a<-c(2,3,4,5,5,3)
b<-c(0,1,1,0,1,0)
b<-factor(b,levels = c(0,1),labels = c("F","M"))
c<-c("Hello","Hi")
datacheck<-data.frame(a,b,c)

现在,我将“str”函数分别应用于a,b和c。

str(datacheck$b)
 Factor w/ 2 levels "F","M": 1 2 2 1 2 1
str(datacheck$c)
 Factor w/ 2 levels "Hello","Hi": 1 2 1 2 1 2
str(datacheck$a)
 num [1:6] 2 3 4 5 5 3

这是好的和预期的,因为b和c是因素。 “a”只是一组数字。

现在,当我跑步时,我得到:

 lapply(datacheck,str)
 num [1:6] 2 3 4 5 5 3
 Factor w/ 2 levels "F","M": 1 2 2 1 2 1
 Factor w/ 2 levels "Hello","Hi": 1 2 1 2 1 2
$a
NULL

$b
NULL

$c
NULL

我的问题是:为什么$ a,$ b和$ c为NULL而不是数字,我们在独立运行str()命令时发现了什么?我环顾四周,也看了一下拉普利,但我找不到答案。

我很感激你的想法。

2 个答案:

答案 0 :(得分:4)

我们需要使用Output(s): Successfully stored 46933 records (12822705 bytes) in: "/profile/main_output_merged" Counters: Total records written : 46933 Total bytes written : 12822705 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1469941650260_0002 -> job_1469941650260_0011, job_1469941650260_0003 -> job_1469941650260_0011, job_1469941650260_0001 -> job_1469941650260_0005,job_1469941650260_0006, job_1469941650260_0005 -> job_1469941650260_0006, job_1469941650260_0006 -> job_1469941650260_0007, job_1469941650260_0007 -> job_1469941650260_0008,job_1469941650260_0009, job_1469941650260_0004 -> job_1469941650260_0008, job_1469941650260_0008 -> job_1469941650260_0010, job_1469941650260_0010 -> job_1469941650260_0011, job_1469941650260_0009 -> job_1469941650260_0011, job_1469941650260_0011 2016-07-31 05:28:54,418 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:55,419 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:56,420 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:56,527 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:28:57,626 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:58,628 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:59,629 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:28:59,732 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:00,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:01,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:02,835 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:02,939 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:04,051 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:05,052 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:06,053 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:06,157 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:07,244 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:08,245 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:09,246 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:09,350 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:10,643 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:11,644 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:12,645 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:12,749 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:13,832 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:14,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:15,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:15,937 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:17,045 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:18,046 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:19,047 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:19,149 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:20,230 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:21,231 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:22,232 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS) 2016-07-31 05:29:22,335 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-07-31 05:29:22,417 [MainThread] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!

class

这会返回lapply(datacheck, class) ,但如果我们需要list

vector

如果我们需要将sapply(datacheck, class) # a b c #"numeric" "factor" "factor" 作为字符输出,我们可以使用str,因为capture.output只打印输出。

str

通过检查

lapply(datacheck, function(x) trimws(capture.output(str(x))))
#$a
#[1] "num [1:6] 2 3 4 5 5 3"

#$b
#[1] "Factor w/ 2 levels \"F\",\"M\": 1 2 2 1 2 1"

#$c
#[1] "Factor w/ 2 levels \"Hello\",\"Hi\": 1 2 1 2 1 2"

我们得到一个NULL作为输出,这就是class(str(datacheck$a)) num [1:6] 2 3 4 5 5 3 #[1] "NULL" 显示NULL

的原因
lapply

检查lapply(datacheck, str)

的源代码
str

答案 1 :(得分:2)

lapply(datacheck,str)中解释了NULL返回help(str)列表的原因:

  

     出于效率原因,

str不返回任何内容。明显的副作用是输出到终端。

因此,区别在于您在控制台窗口中看到的内容以及函数实际返回的内容。使用lapply确实可以看到此内容。