Question

我试图在R中做read.table 我的数据（txt文件）如下所示：

a b c d e
Australia 1 2 4 3 2
United States 1 2 4 2 2

阅读此表的问题是：

1）第1行只有5个元素（a~e），而不是下面所有行中的6个元素。它的列名应该是＆＃34; Country＆＃34;。然后，a对应于第一个数字1，b对应于2，..和e对应于2（在澳大利亚的情况下。）如何将列名称添加到第一列，以便R不会显示一个错误，表示＆＃34;第1行没有6个元素＆＃34;？

2）在美国的情况下，美国是两个单词而不是一个单词，所以当R读取数据时，它会放置＆＃34;状态＆＃34;进入第二栏而不是阅读＆＃34;美国＆＃34;作为一个元素名称。

（我的朋友建议我使用rownames。有没有人知道如何使用rownames？）

如何解决这些问题并正确阅读我的数据？

非常感谢!!

Answer 1

这是另一种可能性。这个引号加上任何两个开始字符串的单词

x <- readLines("your.txt")
x[1] <- paste("Country", x[1])
read.table(text=sub("([A-Za-z]{2,}\\s[A-Za-z]{2,})", "'\\1'", x), header=TRUE)
#         Country a b c d e
# 1     Australia 1 2 4 3 2
# 2 United States 1 2 4 2 2

关于@ akrun关于包含两个以上单词的国家的评论，我认为这将有效：

x[4] <- 'Papua New Guinea 3 4 3 2 5'
xx <- sub("([A-Za-z]{2,}(\\s[A-Za-z]{2,})+)", "'\\1'", x)
read.table(text = xx, header = TRUE)
#            Country a b c d e
# 1        Australia 1 2 4 3 2
# 2    United States 1 2 4 2 2
# 3 Papua New Guinea 3 4 3 2 5

我还想到，国家/地区名称可能是数据框的行名称。如果是这种情况，那么你可以做到

x <- readLines("your.txt")
read.table(text = sub("([A-Za-z]{2,}\\s[A-Za-z]{2,})", "'\\1'", x))
#               a b c d e
# Australia     1 2 4 3 2
# United States 1 2 4 2 2

Answer 2

假设示例数据模仿文件中的内容，我们可以使用readLines阅读它，然后使用regex将country names与其他内容分开。分隔的国家/地区名称可以添加为新列。

lines <- readLines('Betty2.txt')
lines
#[1] "a b c d e"               "Australia 1 2 4 3 2"    
#[3] "United States 1 2 4 2 2"

dat <-  read.table(text=c(lines[1], gsub('[A-Za-z]+\\s+', '',
                lines[-1])), header=TRUE)

在上面的代码中，我们将替换character元素，后跟空格。即。国家/地区名称为''。

i.e 

 gsub('[A-Za-z]+\\s+', '',  lines[-1])
 #[1] "1 2 4 3 2" "1 2 4 2 2"

 dat1 <- data.frame(Country= gsub(" \\d+.*", '', lines[-1]),
                               dat, stringsAsFactors=FALSE)

同样，我们在此处替换space后跟数字（\\d+），后跟一个或多个字符.*和''。

 gsub(" \\d+.*", '', lines[-1])
 #[1] "Australia"     "United States"


dat1
#        Country a b c d e
#1     Australia 1 2 4 3 2
#2 United States 1 2 4 2 2

问题阅读R中的表格

2 个答案: