从网页上抓取的表格被视为单个字符向量:如何转换为数据框?

时间:2017-03-09 23:13:06

标签: r dataframe rvest

我使用rvest包从网页上删除了一个大表,但它正在将其作为单个向量读取:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")

我需要处理的数据框如下所示:

bar<-as.data.frame(cbind(Animal=c("Dog","Cat","Goat"),A=c(1,4,7),B=c(2,5,8),C=c(3,6,9)))

这可能是一个简单的困境,但我很感激帮助。

4 个答案:

答案 0 :(得分:5)

您可以从矢量创建矩阵并将其转换为数据框:

foo<-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal" , foo)
m <- matrix(foo , ncol = 4  , byrow = TRUE)
df <- as.data.frame(m[-1,] , stringsAsFactors = FALSE)  
colnames(df) <- m[1,]
# I assume you want numerics for your A,B,C columns:
df[,2:4]<-apply(df[,2:4],2,as.numeric)

lapply(df,class)
$Animal
[1] "character"

$A
[1] "numeric"

$B
[1] "numeric"

$C
[1] "numeric"

答案 1 :(得分:2)

只需将#define SMALL_NUM 0.00000001 // anything that avoids division overflow // dot product (3D) which allows vector operations in arguments #define dot(u,v) ((u).x * (v).x + (u).y * (v).y + (u).z * (v).z) bool primitive3d::checkIntersectionTriangleRay(ofRay ray, ofPoint* inter) { ofMesh mesh = prim->getMesh(); std::vector<ofMeshFace> indices = mesh.getUniqueFaces(); for (std::vector<ofMeshFace>::iterator i = indices.begin(); i != indices.end(); ++i) { ofMeshFace triangle = *i; ofVec3f u, v, n; // Vecs of triangle ofVec3f dir, w0, w; // Vecs of ofRay float r, a, b; // params to calc ray-plane intersect // get triangle edge vectors and plane normal u = triangle.getVertex(1) - triangle.getVertex(0); v = triangle.getVertex(2) - triangle.getVertex(0); n = u * v; // cross product if (!(n == ofVec3f(0, 0, 0))) // if triangle is not degenerate { dir = ray.getEnd() - ray.getStart(); // ray direction vector w0 = ray.getStart() - triangle.getVertex(0); a = -dot(n, w0); b = dot(n, dir); if (!(fabs(b) < SMALL_NUM)) { // if ray is not parallel to triangle // get intersect point of ray with triangle plane r = a / b; if (!(r < 0.0)) // ray goes toward the triangle { // for a segment, also test if (r > 1.0) => no intersect *inter = ray.getStart() + r * dir; // intersect point of ray and plane // is I inside T? float uu, uv, vv, wu, wv, D; uu = dot(u, u); uv = dot(u, v); vv = dot(v, v); w = *inter - triangle.getVertex(0); wu = dot(w, u); wv = dot(w, v); D = uv * uv - uu * vv; // get and test parametric coords float s, t; s = (uv * wv - vv * wu) / D; if (!(s < 0.0 || s > 1.0)) // I is inside T { t = (uv * wu - uu * wv) / D; if (!(t < 0.0 || (s + t) > 1.0)) // I is inside T return true; // I is in T } } } } } return false; } 分成所需的行数,然后split。我在rbind的开头添加了"Animal",以便在拆分时使每行中的元素相等

foo

答案 2 :(得分:1)

如果您想要正确的列类型,可以试试这个。拆分为列表,命名列表,然后在强制转换为数据框之前转换列类型。

l <- setNames(split(tail(foo, -3), rep(1:4, 3)), c("Animal", foo[1:3]))
as.data.frame(lapply(l, type.convert))  ## stringsAsFactors=FALSE if desired
#    Animal A B C
# 1     Dog 1 2 3
# 2     Cat 4 5 6
# 3    Goat 7 8 9

答案 3 :(得分:0)

这是一个使用list的便捷工具,

 seqList <-
function(character,by= 1,res=list()){
    ### sequence characters by 
    if (length(character)==0){
        res
    } else{
        seqList(character[-c(1:by)],by=by,res=c(res,list(character[1:by])))

    }
    }

一旦将角色转换为列表,就可以更容易地操纵它们,例如你可以做到。

options(stringsAsFactors=FALSE)

foo <-c("A","B","C","Dog","1","2","3","Cat","4","5","6","Goat","7","8","9")
foo <- c("Animal",foo)

df <- data.frame(t(do.call("rbind",
    lapply(1:4,function(x) do.call("cbind",lapply(seqList(foo,4),"[[",x))))))

colnames(df) <- df[1,]

df <- df[-1,]

## > df
##   Animal A B C
## 2    Dog 1 2 3
## 3    Cat 4 5 6
## 4   Goat 7 8 9

注意: 我没有测试过该功能的效率。对于大量字符来说,它可能效率不高。 使用矩阵可能是这项工作的更好工具。