STR（IPCS）

Question

我有一个包含多个数据框的列表（IPC）。

这是我列表中的一个示例：

  $ http://www.sumobrain.com/patents/us/Measured-object-support-mechanism-for-unbalance-measuring-apparatus/4981043.html           
:List of 1
..$ :'data.frame':  3 obs. of  5 variables:
.. ..$ X1: chr [1:3] "2001826A" "2857764A" "3452604A"
.. ..$ X2: chr [1:3] "1935-05-21" "1958-10-28" "1969-07-01"
.. ..$ X3: chr [1:3] "Russell et al." "Frank" "Schaub"
.. ..$ X4: chr [1:3] "73/478" "73/477" "73/475"
.. ..$ X5: chr [1:3] "Machine for balancing heavy bodies" "Rotor balance testing machine" "BALANCE TESTING APPARATUS HEAD"
$ http://www.sumobrain.com/patents/us/Encoder-with-wide-index/4982189.html   
 :List of 1
..$ :'data.frame':  8 obs. of  5 variables:
.. ..$ X1: chr [1:8] "3500449A" "4212000A" "4233592A" "4524347A" ...
.. ..$ X2: chr [1:8] "1970-03-10" "1980-07-08" "1980-11-11" "1985-06-18" ...
.. ..$ X3: chr [1:8] "Lenz" "Yamada" "Leichle" "Rogers" ...
.. ..$ X4: chr [1:8] "341/6" "341/16" "341/6" "341/3" ...
.. ..$ X5: chr [1:8] "ELECTRONIC ENCODER INDEX" "Position-to-digital encoder" "Method for detection of the angular position of a part driven in rotation and instrumentation using it" "Position encoder" ...
$ http://www.sumobrain.com/patents/us/Device-for-detecting-at-least-one-variable-relating-to-the-movement-of-a-movable-body/4982106.html   
:List of 1
..$ :'data.frame':  2 obs. of  5 variables:
.. ..$ X1: chr [1:2] "3956973A" "4797564A"
.. ..$ X2: chr [1:2] "1976-05-18" "1989-01-10"
.. ..$ X3: chr [1:2] "Pomplas" "Ramunas"
.. ..$ X4: chr [1:2] "92/5R" "307/119"
.. ..$ X5: chr [1:2] "Die casting machine with piston positioning control" "Robot overload detection mechanism"

我想从所有数据框中仅选择第一个和第五个元素（X1和X5），以便稍后构建仅包含这两个元素的另一个数据集。

我试图用这个抓住X1：

citations_IPC <- sapply(IPCs, function(x){
y<-x[,1]
return(y)
})

和X5：

citations_titles <- sapply(IPCs[[1]], function(z){
e<-z[,5]
return(e)
})

然后我将citations_IPCs和citations_titles转换为单个数据框：

citation_list <-  data.frame(IPC = unlist(lapply(citations_IPC, paste)), title = unlist(lapply(citations_titles, paste)) )

1个＃问题

如果我在单个列表上编写sapply函数（例如IPC [[1]]），我会得到我想要的结果：

citations_IPC <- sapply(IPCs[[1]], function(x){
y<-x[,1]
return(y)
})

结果：

> citations_IPC
      [,1]      
 [1,] "3415985A"
 [2,] "3916190A"
 [3,] "4088895A"
 [4,] "4633084A"
 [5,] "4670651A"
 [6,] "4860224A"

但是，此功能不适用于整个列表（IPC）。我得到的错误是： “x [，1]中的错误：维数不正确”

我猜这个问题可能是由于我的数据集中的一些列表没有数据框，没有观察，也没有变量。在这种情况下，我需要一个允许我在数据集上使用sapply（）的函数，尽管没有数据框的行。

请任何建议真的很感激。

非常感谢

STR（IPCS）

> str(IPCs)
 List of 19
 $ http://www.sumobrain.com/patents/us/Method-and-apparatus-for-the-quantitative,-depth-differential-analysis-of-solid-samples-with-the-use-of-two-ion-beams/4982090.html       :List of 1
  ..$ :'data.frame':    6 obs. of  5 variables:
  .. ..$ X1: chr [1:6] "3415985A" "3916190A" "4088895A" "4633084A" ...
  .. ..$ X2: chr [1:6] "1968-12-10" "1975-10-28" "1978-05-09" "1986-12-30" ...
  .. ..$ X3: chr [1:6] "Castaing et al." "Valentine et al." "Martin" "Gruen et al." ...
  .. ..$ X4: chr [1:6] "250/309" "250/309" "250/309" "250/309" ...
  .. ..$ X5: chr [1:6] "Ionic microanalyzer wherein secondary ions are emitted from a sample surface upon bombardment by neutral atoms" "Depth profile analysis apparatus" "Memory device utilizing ion beam readout" "High efficiency direct detection of ions from resonance ionization of sputtered atoms" ...
 $ http://www.sumobrain.com/patents/us/Set-on-oscillator/4982165.html    
 :List of 1
  ..$ :'data.frame':    2 obs. of  5 variables:
  .. ..$ X1: chr [1:2] "4437066A" "4558282A"
  .. ..$ X2: chr [1:2] "1984-03-13" "1985-12-10"
  .. ..$ X3: chr [1:2] "Gordon" "Lowenschuss"
  .. ..$ X4: chr [1:2] "328/14" "307/523"
  .. ..$ X5: chr [1:2] "Apparatus for synthesizing a signal by producing samples of such signal at a rate less than the Nyquist sampling rate" "Digital frequency synthesizer"
 $ http://www.sumobrain.com/patents/us/Voltage-measuring-apparatus/4982151.html 
 :List of 1
  ..$ :'data.frame':    7 obs. of  5 variables:
  .. ..$ X1: chr [1:7] "3419802A" "3419803A" "4446425A" "4603293A" ...
  .. ..$ X2: chr [1:7] "1968-12-31" "1968-12-31" "1984-05-01" "1986-07-29" ...
  .. ..$ X3: chr [1:7] "Pelenc et al." "Pelenc et al." "Valdmanis et al." "Mourou et al." ...
  .. ..$ X4: chr [1:7] "324/96" "324/96" "" "" ...
  .. ..$ X5: chr [1:7] "Apparatus for current measurement by means of the faraday effect" "Apparatus for current measurement by means of the faraday effect" "Measurement of electrical signals with picosecond resolution" "Measurement of electrical signals with subpicosecond resolution" ...

Answer 1

以下是一个例子：

首先让我们列出一些随机的虹膜列：

data(iris)
lis = list(iris[1:3], iris[2:4])

使用lapply和自定义函数从每个数据框中提取列1和2。如果它们没有被命名为相同的强制重命名列的下一步：

b = lapply(lis, function(x){
  z = x[,c(1,2)]
  colnames(z) = c("z1", "z2")
  return(z)
}
)

现在b是您希望列的列表。

rbind b中的数据框：

do.call(rbind, b)

完成

Answer 2

这是一种做我理解你的问题的方法。
首先是一些假数据。

op <- options(stringsAsFactors = FALSE)  # to make sure we have characters not factors
set.seed(9506)

nr <- c(6, 2, 7)
IPCs <- lapply(1:3, function(n){
        res <- as.data.frame(replicate(5, sample(LETTERS, nr[n], TRUE)))
        names(res) <- paste0("X", 1:5)
        res
})
names(IPCs) <- paste0("df", seq_along(dat))
str(IPCs)
options(op)   # put it back as it was

现在提取每个data.frame的第1列和第5列并将它们粘贴在一起以形成df的代码。

result <- list(
    sapply(IPCs, `[[`, 1),
    sapply(IPCs, function(x) x[[ncol(x)]])
)
result <- as.data.frame(lapply(result, function(x) sapply(x, paste, collapse = "")))
names(result) <- c("citations_IPC", "citations_titles")
result

从列表中的多个列表中选择数据框元素

STR（IPCS）

2 个答案: