R:分组/循环通过一列数据框

时间:2017-08-02 15:00:23

标签: r loops dataframe

我在R中有一个数据帧df,这里是前6行。

df <- data.frame (npi_one = c('n1487','n1952','n1952','n1467','n1467','n1538'),
                  npi_two = c('n1467','n1467','n1487','n1508','n1538','n1508'),
                  weight = c(1,1,2,1,1,1),
                  hee_provn1=c(rep(015171,3),rep(015443,3)))

我想通过hee_provn1分组,然后做一个循环,第一个循环的代码是:

library(igraph)
library(dplyr)
library(data.table)

df2 <- filter(df, hee_provn1 == 015171)
df3 <- df2 [,c("npi_one","npi_two")]
l = c(apply(df3,1,c))
G <- graph(l,directed = FALSE)

d <- degree(G)
c <- closeness(G,weight = df2$weight)
b <- betweenness(G, weight = df2$weight)
e <- eigen_centrality(G,weight = df2$weight)$vector

cent_df = data.frame(d,c,b,e)
colnames(cent_df) <- c('degree', 'closeness','betweenness','eigen')
setDT(cent_df, keep.rownames = TRUE)[]
setnames(cent_df,1,"npi")
cbind(hee_provn1 = 015171,cent_df) 

第一个循环的结果表(hee_provn1 == 015171)是

   hee_provn1   npi degree closeness betweenness     eigen
1:      15171 n1487      2 0.3333333         0.0 1.0000000
2:      15171 n1467      2 0.5000000         0.5 0.7320508
3:      15171 n1952      2 0.3333333         0.0 1.0000000

第二个循环的结果表(hee_provn1 == 015171)是

   hee_provn1   npi degree closeness betweenness eigen
1:      15443 n1467      2       0.5           0     1
2:      15443 n1508      2       0.5           0     1
3:      15443 n1538      2       0.5           0     1

我是R的新手,我不知道如何根据数据框的一列进行分组和循环。

另外,我希望我的最终结果是将所有表放在一起的大表,例如:

   hee_provn1   npi degree closeness betweenness     eigen
1:      15171 n1487      2 0.3333333         0.0 1.0000000
2:      15171 n1467      2 0.5000000         0.5 0.7320508
3:      15171 n1952      2 0.3333333         0.0 1.0000000
4:      15443 n1467      2       0.5           0     1
5:      15443 n1508      2       0.5           0     1
6:      15443 n1538      2       0.5           0     1

由于某种原因,我不能使用R包tidyverse,谢谢

我尝试了Balter的方法,

df <- data.frame (npi_one = c('n1487','n1952','n1952','n1467','n1467','n1538'),
                  npi_two = c('n1467','n1467','n1487','n1508','n1538','n1508'),
                  weight = c(1,1,2,1,1,1),
                  hee_provn1=c(rep(015171,3),rep(015443,3)))

library(igraph)
library(dplyr)
library(data.table)

final.df <- c()
for(x in unique(df$hee_provn1)){
  df2 <- subset(df, subset = hee_provn1 == x)

  df3 <- df2 [,c("npi_one","npi_two")]
  l = c(apply(df3,1,c))
  G <- graph(l,directed = FALSE)

  d <- degree(G)
  c <- closeness(G,weight = df2$weight)
  b <- betweenness(G, weight = df2$weight)
  e <- eigen_centrality(G,weight = df2$weight)$vector

  result <- data.frame(d,c,b,e)
  setDT(result, keep.rownames = TRUE)[]
  setnames(result,1,"npi")
  cbind(hee_provn1 = x,result)
  final.df <- rbind(final.df, result)
}
colnames(final.df) <- c('npi','degree', 'closeness','betweenness','eigen')

结果是:

     npi degree closeness betweenness     eigen
1: n1487      2 0.3333333         0.0 1.0000000
2: n1467      2 0.5000000         0.5 0.7320508
3: n1952      2 0.3333333         0.0 1.0000000
4: n1467      2 0.5000000         0.0 1.0000000
5: n1508      2 0.5000000         0.0 1.0000000
6: n1538      2 0.5000000         0.0 1.0000000

看起来它与我的理想结果有什么不同,如何成功地跟踪产生它的迭代?

2 个答案:

答案 0 :(得分:2)

在不加载dplyr的情况下重新开始。然后...

library(data.table)
library(igraph)
setDT(df)

# clean bad formatting
df[, `:=`(npi_one = as.character(npi_one), npi_two = as.character(npi_two))]

df[, {
  G = graph_from_edgelist(cbind(npi_one, npi_two), directed = FALSE)
  .(
    v = V(G)$name,
    d = degree(G),
    c = closeness(G, weight = weight),
    b = betweenness(G, weight = weight),
    e = eigen_centrality(G, weight = weight)$vector
  )
}, by=hee_provn1]

给出了......

   hee_provn1     v d         c   b         e
1:      15171 n1487 2 0.3333333 0.0 1.0000000
2:      15171 n1467 2 0.5000000 0.5 0.7320508
3:      15171 n1952 2 0.3333333 0.0 1.0000000
4:      15443 n1467 2 0.5000000 0.0 1.0000000
5:      15443 n1508 2 0.5000000 0.0 1.0000000
6:      15443 n1538 2 0.5000000 0.0 1.0000000

工作原理

Data.table语法为DT[i, j, by=],按i(此处不需要),按by=分组,然后计算jj应评估为列表,list()可以写为.()作为简写。

为什么不加载dplyr?它不是必需的,igraph已经有足够的命名空间冲突。

如果你真的想使用dplyr,我强烈建议你不要同时使用data.table ......

library(dplyr)
library(magrittr)
library(igraph)

# fix bad formatting
df %<>% mutate(npi_one = as.character(npi_one), npi_two = as.character(npi_two))

df %>% group_by(hee_provn1) %>% do(with(., {
  G = graph_from_edgelist(cbind(npi_one, npi_two), directed = FALSE)
  data.frame(
    v = V(G)$name,
    d = degree(G),
    c = closeness(G, weight = weight),
    b = betweenness(G, weight = weight),
    e = eigen_centrality(G, weight = weight)$vector
  )
}))

# A tibble: 6 x 6
# Groups:   hee_provn1 [2]
  hee_provn1     v     d         c     b         e
       <dbl> <chr> <dbl>     <dbl> <dbl>     <dbl>
1      15171 n1487     2 0.3333333   0.0 1.0000000
2      15171 n1467     2 0.5000000   0.5 0.7320508
3      15171 n1952     2 0.3333333   0.0 1.0000000
4      15443 n1467     2 0.5000000   0.0 1.0000000
5      15443 n1508     2 0.5000000   0.0 1.0000000
6      15443 n1538     2 0.5000000   0.0 1.0000000

答案 1 :(得分:1)

我能想到的最简单的方法(无需重新创建整个代码):

.flex-row>*{height:100px;}

因此,您需要为hee_provn1中的每个唯一值对表进行子集化,执行您的操作,然后在结果中附加数据帧。