使用magrittr和(*)应用函数 - 输出不同

时间:2015-10-23 15:33:46

标签: r

我在一个文件中读入一个包含一列字符的data.frame,并使用sapply逐个元素地应用了一些函数。我有兴趣使用magrittr包中的管道,我想知道这是什么是正确和最好的方法。我已经尝试了一些变体,虽然它有效(我相信!),但两种方法的输出确实不同。

我将sapply本身的占位符省略为下一个函数的%>% pipes into the first parameter

原始代码

## Load in a CSV file as a dataframe
   y <- read.csv(file = file_name, header = TRUE, sep = "\n", quote = "", row.names = NULL, stringsAsFactors = FALSE)
## Perform so operations
y$transformed1 <- sapply(y, FUN = function(x) gsub("&amp;", "&", x))
y$transformed2 <- sapply(y$transformed1, FUN = function(x) gsub(pattern = "http\\S+\\s*", replacement = "", x)) 
y$transformed3 <- sapply(y$transformed2, FUN = function(x) gsub("[^[:alpha:][:space:]&\']", "", x)) 
y$transformed4 <- sapply(y$transformed3, FUN = function(x) stripWhitespace(x)) 
y$transformed5 <- sapply(y$transformed4, FUN = function(x) gsub("^ ", "", x))
y$transformed6 <- sapply(y$transformed5, FUN = function(x) gsub(" $", "", x))

这对我来说非常有效,在y$transformed6中为我的需求返回了一个干净的结果。

使用magrittr

以下代码运行良好,在目视检查中,结果看起来完全相同,如下面head函数的比较所示。

in_file <- y   ## from above

out_file <-  sapply(in_file, function(x) gsub("&amp;", "&", x)) %>%
                    gsub("http\\S+\\s*", "", .) %>%
                    gsub("[^[:alpha:][:space:]&\']", "", .) %>%
                    stripWhitespace() %>%
                    gsub("^ ", "", .) %>%
                    gsub(" $", "", .)

在这里,您可以看到每个输出的str()head()函数的返回值。使用identical函数毫不奇怪地返回FALSE

## First method

> str(y$transformed6)                                
 chr [1:14158] "ExchangeNews Direct S&P Dow Jones Ind

> head(y$transformed6)                               
[1] "ExchangeNews Direct S&P Dow Jones Indices Announ
[2] "Svelte Medical Systems Raises M for HeartSurgery
[3] "Dow Jones industrial average tumbles below on bu
[4] "Dow approaches record high as Fed meeting begins
[5] "Money How the Dow Jones industrial average did T
[6] "Just another day at dowjones brewing up exciting

-----------------------------------------------------

## Using magittr

> str(out_file)                                      
 chr [1:14158, 1] "ExchangeNews Direct S&P Dow Jones 
 - attr(*, "dimnames")=List of 2                     
  ..$ : NULL                                         
  ..$ : chr "text"                                   

> head(out_file)                                     
     text                                            
[1,] "ExchangeNews Direct S&P Dow Jones Indices Annou
[2,] "Svelte Medical Systems Raises M for HeartSurger
[3,] "Dow Jones industrial average tumbles below on b
[4,] "Dow approaches record high as Fed meeting begin
[5,] "Money How the Dow Jones industrial average did 
[6,] "Just another day at dowjones brewing up excitin

差异来自哪里?我做错了什么,或者只是使用magittr

的人工制品

1 个答案:

答案 0 :(得分:1)

您应该将管道放在positions = np.array([1, 10, 100, 1000]) ax.set_xticks(positions) ax.set_xticklabels(np.log10(positions)) 内,而不是

之后
sapply