试图弄清楚如何将以下数据拉入r:
http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0
这几乎可行,但我想消除顶部和底部的垃圾,然后得到分数。
read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0',
widths=c(11,26,3,26,3,4,21),
skip = 8)
答案 0 :(得分:0)
首先欢迎堆叠交换!所以我改变了代码中的某些东西,比如你只需要6个宽度,你有一个额外的列,所以我摆脱了它。当我从在线提取数据时,我注意到第一行非常奇怪,所以我只是把它全部放在一起然后再手动添加它。
data <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0',widths=c(10,26,3,26,3,4), sep = "\t", header = FALSE, skip = 8)
# This line subsets the data so you don't have that "junk" at the bottom and deletes the row
# with the html tagging.
data <- data[2:2424,]
data <- data.frame(data)
# Create a vector that has the column headers
names <- c("date", "Team1","Runs", "Team 2","Runs","Something")
colnames(data) <- names
# Create the first row of data that we previously deleted.
firstrow = data.frame("2016-04-03", "@Pirates", 4, "Cardinals",1,"")
colnames(firstrow) <- names
finaldata <- rbind.data.frame(firstrow,data)
如果您可以发布您认为垃圾邮件的屏幕截图,以便将来帮助您尝试帮助您解决问题。
<强>更新强>
data <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0',
widths=c(10,26,3,26,3,4), sep = "\t", header = FALSE, skip = 9)
data <- data.frame(data)
# This line subsets the data so you don't have that "junk" at the bottom and deletes the row
# with the html tagging.
firstrow <- read.fwf('http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=3&sch=on&format=0',
widths=c(-8,-1,-1,9,26,3,26,3,4), sep = "\t", header = FALSE, n = 1, skip = 8)
firstrow <- data.frame(firstrow,stringsAsFactors=FALSE)
firstrow[,1] <- paste("2",firstrow[1,1],sep = "")
# Create a vector that has the column headers
names <- c("date", "Team1","Runs", "Team 2","Runs","Something")
colnames(data) <- names
colnames(firstrow) <- names
finaldata <- rbind.data.frame(firstrow,data)
用于移动数据的列的负值,我只是用它来玩,直到它完成,以便第一行中缺少的所有内容都是&#34; 2&#34;。然后我粘贴在&#34; 2&#34;并使用rbind函数创建完整的数据框。我希望能帮到你。
我也在此页面上对其进行了测试:http://masseyratings.com/scores.php?s=285971&sub=14342&all=1&mode=2&sch=on&format=0 它按预期工作。