Question

由于我无法提供.txt文件，因此我只能描述情况......

文本文件没有缺失值，是一个以制表符分隔的文本文件，或者至少看起来是这样。当我使用制表符分隔分隔符时，似乎没问题。列标题包含包含空格的名称（例如，父母的年龄）。

当我使用以下代码行加载数据时，看起来所有内容都正确加载。但是我最终得到了一堆重复的列。

例如 - ＆＃34;父母年龄＆＃34; 将重新标记为 Age.of.Parent ，因为您可以＆＃ 39; t列名中有空格但是第二列与值相同但名称为 Age.of.Parent1

问题：我需要做些什么才能确保这些内容都不会重复？＆＃39;重复＆＃39;列正在创建？ Age.of.Parent1列显然不在数据集中，但是在20列中，我最终总共30个（最后10个重复，最后是＃＆1;＆＃39;）。

read.table('mydata.txt', header=TRUE,  stringsAsFactors= FALSE, sep='\t')

Answer 1

这是一个示例，说明如何将数据框保存在制表符分隔文件中并从中读取。

library(caroline)

Age <- c(20, 30, 50) 
Names <- c("Name1", "Name2", "Name3") 
df <- data.frame(Age, Names)
colnames(df) <- c("Age of Parents", "Names of Parents")

#writing the data frame to a tab delimited text file
write.delim(df, file = "foo.txt")

#reading the tab delimited text file 
#The argument fill is logical. If TRUE then in case the rows have unequal length, blank fields are implicitly added.
read.delim(file="foo.txt", header = TRUE, sep = "\t", fill = TRUE)

输出如下：

使用read.table

1 个答案: