出于好奇,我正在测试递归lapply是否给出了与手动应用函数相同的结果。我发现lapply行为不规律。所以,这就是我的所作所为:
示例1:
m<-c(2,3,4)
n<-c(5,6,3)
o<-c(1,1,1.5)
dc<-data.frame(m,n,o)
现在,让我们分析一下有趣的部分:
lapply(dc,mode)
给出:
lapply(dc,mode)
$m
[1] "numeric"
$n
[1] "numeric"
$o
[1] "numeric"
让我们单独比较上面的结果运行模式,说“m”。
mode(dc$m)
我得到了:
"numeric"
与他人同上。这一切都很好,因为我们有原子矢量。
现在,让我们分析另一个例子:
示例2:
a<-c(2,3,4,5,5,3)
b<-c(0,1,1,0,1,0)
b<-factor(b,levels = c(0,1),labels = c("F","M"))
c<-c("Hello","Hi")
datacheck<-data.frame(a,b,c)
现在,我将“str”函数分别应用于a,b和c。
str(datacheck$b)
Factor w/ 2 levels "F","M": 1 2 2 1 2 1
str(datacheck$c)
Factor w/ 2 levels "Hello","Hi": 1 2 1 2 1 2
str(datacheck$a)
num [1:6] 2 3 4 5 5 3
这是好的和预期的,因为b和c是因素。 “a”只是一组数字。
现在,当我跑步时,我得到:
lapply(datacheck,str)
num [1:6] 2 3 4 5 5 3
Factor w/ 2 levels "F","M": 1 2 2 1 2 1
Factor w/ 2 levels "Hello","Hi": 1 2 1 2 1 2
$a
NULL
$b
NULL
$c
NULL
我的问题是:为什么$ a,$ b和$ c为NULL而不是数字,我们在独立运行str()命令时发现了什么?我环顾四周,也看了一下拉普利,但我找不到答案。
我很感激你的想法。
答案 0 :(得分:4)
我们需要使用Output(s):
Successfully stored 46933 records (12822705 bytes) in: "/profile/main_output_merged"
Counters:
Total records written : 46933
Total bytes written : 12822705
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1469941650260_0002 -> job_1469941650260_0011,
job_1469941650260_0003 -> job_1469941650260_0011,
job_1469941650260_0001 -> job_1469941650260_0005,job_1469941650260_0006,
job_1469941650260_0005 -> job_1469941650260_0006,
job_1469941650260_0006 -> job_1469941650260_0007,
job_1469941650260_0007 -> job_1469941650260_0008,job_1469941650260_0009,
job_1469941650260_0004 -> job_1469941650260_0008,
job_1469941650260_0008 -> job_1469941650260_0010,
job_1469941650260_0010 -> job_1469941650260_0011,
job_1469941650260_0009 -> job_1469941650260_0011,
job_1469941650260_0011
2016-07-31 05:28:54,418 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:55,419 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,420 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,527 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:28:57,626 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:58,628 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,629 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,732 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:00,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:01,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,835 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,939 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:04,051 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:05,052 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,053 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,157 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:07,244 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:08,245 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,246 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,350 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:10,643 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:11,644 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,645 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,749 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:13,832 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:14,833 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,834 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,937 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:17,045 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:18,046 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,047 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,149 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:20,230 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:21,231 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,232 [MainThread] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,335 [MainThread] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:22,417 [MainThread] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
class
这会返回lapply(datacheck, class)
,但如果我们需要list
vector
如果我们需要将sapply(datacheck, class)
# a b c
#"numeric" "factor" "factor"
作为字符输出,我们可以使用str
,因为capture.output
只打印输出。
str
通过检查
lapply(datacheck, function(x) trimws(capture.output(str(x))))
#$a
#[1] "num [1:6] 2 3 4 5 5 3"
#$b
#[1] "Factor w/ 2 levels \"F\",\"M\": 1 2 2 1 2 1"
#$c
#[1] "Factor w/ 2 levels \"Hello\",\"Hi\": 1 2 1 2 1 2"
我们得到一个NULL作为输出,这就是class(str(datacheck$a))
num [1:6] 2 3 4 5 5 3
#[1] "NULL"
显示NULL
lapply
检查lapply(datacheck, str)
str
答案 1 :(得分:2)
lapply(datacheck,str)
中解释了NULL
返回help(str)
列表的原因:
值
出于效率原因,
str
不返回任何内容。明显的副作用是输出到终端。
因此,区别在于您在控制台窗口中看到的内容以及函数实际返回的内容。使用lapply
确实可以看到此内容。