Question

我有一个大文本表，标签分隔。第一行是标题。然后，我有第二个文本文件，其中包含第一个文件中的标头的子集。我想提取第一个文件的所有列，其标题包含在第二个文件中给出的列表中。这是输入和所需输出的示例：

DATA.TXT

 head0 head1 head2 head3 head4  
 1 25 1364 22 13  
 2 10 215 1 22

LIST.TXT

head0  
head4

期望输出

head0 head4  
1 13  
2 22

Answer 1

我们可以使用base R方法

df1[df2[[1]]]

数据

#specify the `sep` as well
df1 <- read.table('Data.txt', header = TRUE, stringsAsFactors = FALSE)
df2 <- read.table('List.txt', header = FALSE, stringsAsFactors = FALSE)

Answer 2

我认为R可以直接执行此操作（假设您可以轻松读取数据）。这就像是

mydata <- read.table('data_filename.txt', header=T, ...)

# This one looks like header=F in your example...not quite sure how your data is structured
mycolumns <- read.table('columns_filename.txt', header=F, ...)

# x should be the name of the column
final_data <- dplyr::select(mydata, mycolumns$x)

代码不完整但应该很容易弄清楚细节它也可以通过子集化在基数R中完成（参见其他答案）。

根据从matlab或R中的第二个文件中选择的标题从文件中提取列

2 个答案:

数据