我在一个文件中读入一个包含一列字符的data.frame,并使用sapply
逐个元素地应用了一些函数。我有兴趣使用magrittr
包中的管道,我想知道这是什么是正确和最好的方法。我已经尝试了一些变体,虽然它有效(我相信!),但两种方法的输出确实不同。
我将sapply
本身的占位符省略为下一个函数的%>%
pipes into the first parameter。
原始代码
## Load in a CSV file as a dataframe
y <- read.csv(file = file_name, header = TRUE, sep = "\n", quote = "", row.names = NULL, stringsAsFactors = FALSE)
## Perform so operations
y$transformed1 <- sapply(y, FUN = function(x) gsub("&", "&", x))
y$transformed2 <- sapply(y$transformed1, FUN = function(x) gsub(pattern = "http\\S+\\s*", replacement = "", x))
y$transformed3 <- sapply(y$transformed2, FUN = function(x) gsub("[^[:alpha:][:space:]&\']", "", x))
y$transformed4 <- sapply(y$transformed3, FUN = function(x) stripWhitespace(x))
y$transformed5 <- sapply(y$transformed4, FUN = function(x) gsub("^ ", "", x))
y$transformed6 <- sapply(y$transformed5, FUN = function(x) gsub(" $", "", x))
这对我来说非常有效,在y$transformed6
中为我的需求返回了一个干净的结果。
使用magrittr
以下代码运行良好,在目视检查中,结果看起来完全相同,如下面head
函数的比较所示。
in_file <- y ## from above
out_file <- sapply(in_file, function(x) gsub("&", "&", x)) %>%
gsub("http\\S+\\s*", "", .) %>%
gsub("[^[:alpha:][:space:]&\']", "", .) %>%
stripWhitespace() %>%
gsub("^ ", "", .) %>%
gsub(" $", "", .)
在这里,您可以看到每个输出的str()
和head()
函数的返回值。使用identical
函数毫不奇怪地返回FALSE
。
## First method
> str(y$transformed6)
chr [1:14158] "ExchangeNews Direct S&P Dow Jones Ind
> head(y$transformed6)
[1] "ExchangeNews Direct S&P Dow Jones Indices Announ
[2] "Svelte Medical Systems Raises M for HeartSurgery
[3] "Dow Jones industrial average tumbles below on bu
[4] "Dow approaches record high as Fed meeting begins
[5] "Money How the Dow Jones industrial average did T
[6] "Just another day at dowjones brewing up exciting
-----------------------------------------------------
## Using magittr
> str(out_file)
chr [1:14158, 1] "ExchangeNews Direct S&P Dow Jones
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "text"
> head(out_file)
text
[1,] "ExchangeNews Direct S&P Dow Jones Indices Annou
[2,] "Svelte Medical Systems Raises M for HeartSurger
[3,] "Dow Jones industrial average tumbles below on b
[4,] "Dow approaches record high as Fed meeting begin
[5,] "Money How the Dow Jones industrial average did
[6,] "Just another day at dowjones brewing up excitin
差异来自哪里?我做错了什么,或者只是使用magittr
?
答案 0 :(得分:1)
您应该将管道放在positions = np.array([1, 10, 100, 1000])
ax.set_xticks(positions)
ax.set_xticklabels(np.log10(positions))
内,而不是
sapply