R:总结,drop列更改列表中dataframe的名称,并将结果保存到env

时间:2017-09-02 12:15:58

标签: r dataframe tidyr

此主题混合了this onethis one。 我的麻烦来自于我无法将函数/代码传递给一组字节的所有元素。我知道如何逐行获得想要的结果,但不能在整体上做到。

对于主题,让我们在结构上采用与我的实际情况非常相似的两个元素。

MyRes_tw <- structure(list(text = c("follow @SmartRE_Info and get your token in waves t.co/g3q4XelPaK #SmartRE", 
"RT @investFeed: Make your FEED work for you - check out this blog on the power of the FEED token: t.co/JOHSCeitGc", 
"RT @investFeed: WE HAVE NOW PASSED 8,000 $ETH IN OUR TOKEN SALE PURCHASED! t.co/bx7s1xWyXI #ICO #Tokensale t.co/ZFndFhUfVT"
), Tweet.id = c("889602043249254400", "889589518159945729", "889573909405679616"
), created.date = structure(c(17371, 17371, 17371), class = "Date"), 
    created.week = c(30, 30, 31), retweet = c(0, 0, 0), custom = c(0, 
    0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week", 
"retweet", "custom"), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

MyRes1_tw <- structure(list(text = c("RT @AmbrosusAMB: We are on the front page of #NASDAQ / #Editorial Choice, Proud #Ethereum #Blockchain #ICO #TGE @Nasdaq @gavofyork @jutta_s…", 
"RT @MyBit_DApp: 10 minutes left in #mybit #tokensale over 10,000 #ethereum contributed! Check it out t.co/AgyRCcyyzD", 
"RT @MyBit_DApp: only 23 ETH left now", "RT @MyBit_DApp: #MyBit #tokensale ends in ~1 hour. 9k+ $ETH raised so far. Only 125 #ethereum left at 25% discount. t.co/AgyRCcyyzD", 
"RT @MyBit_DApp: ~12 hours left in the t.co/AgyRCcyyzD #TokenSale #ICO 25% Bonus activated for #ethereum $ether #bitcoin $BTC $xbt"
), Tweet.id = c("897499492219445252", "897487635442274305", "897487621714305024", 
"897487610494558208", "897487593117450244"), created.date = structure(c(17393, 
17393, 17393, 17393, 17393), class = "Date"), created.week = c(33, 
33, 34, 34, 34), retweet = c(0, 0, 0, 0, 0), custom = c(0, 0, 
0, 0, 0)), .Names = c("text", "Tweet.id", "created.date", "created.week", 
"retweet", "custom"), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

这两个df是来自Twitter的数据。我想对它们做一些整洁,以便最终得到这些结果:

MyRes <- structure(list(created.week = c(33, 34, 35), retweet = c(12, 
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes", "MyRes", 
"MyRes")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

MyRes1 <- structure(list(created.week = c(33, 34, 35), retweet = c(12, 
0, 8), custom = c(0, 0, 2), Twitter.name = c("MyRes1", "MyRes1", 
"MyRes1")), .Names = c("created.week", "retweet", "custom", "Twitter.name"
), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

请注意,名称很重要,每个结果的名称都是从 _tw 开始时删除的字符名称。

同样请注意,在最终结果中,最后一栏 $ Twitter.name 应反映tibble名称。

列出我的环境中的这种方式myUser.tw <- ls(,pattern = "_tw"),因为它们是唯一以 _tw 结尾的对象。

我写了这个函数来帮助:

MySummarize <- function(x){
  summarise(group_by(x, created.week, Retweet.count = sum(retweet), Custom.count = sum(custom)))
}

现在来了痛苦!以下是我的工作代码:

testLst <- mget(myUser.tw) %>% 
  lapply(function(x) MySummarize(x)) %>% 
  list2env(testLst, envir = .GlobalEnv)

然后我无法找到方法:

  1. 更改df的名称以获取MyRes,MyRes1作为名称
  2. 添加一列,其中包含上述文本的所有行(MyRes,MyRes1)
  3. 将结果保存在我的环境中。
  4. 不管你信不信,我已经这么久了。我很感激帮助完成我的整个代码。谢谢

2 个答案:

答案 0 :(得分:2)

不清楚什么&#34; df&#34;指的是但是如果目标是获得一个附加了源列的摘要列表:

library(dplyr)

myUser.tw %>% 
  mget(.GlobalEnv) %>%
  lapply(MySummarize) %>%
  bind_rows(.id = "source") %>%
  mutate(source = sub("_tw$", "", source)) %>%
  split(.$source)

,并提供:

$MyRes
# A tibble: 2 x 4
# Groups:   created.week, Retweet.count [2]
  source created.week Retweet.count Custom.count
   <chr>        <dbl>         <dbl>        <dbl>
1  MyRes           30             0            0
2  MyRes           31             0            0

$MyRes1
# A tibble: 2 x 4
# Groups:   created.week, Retweet.count [2]
  source created.week Retweet.count Custom.count
   <chr>        <dbl>         <dbl>        <dbl>
1 MyRes1           33             0            0
2 MyRes1           34             0            0

或者如果您想要单个数据框,则省略split

答案 1 :(得分:1)

一种可能的方法:

# list of tibbles with tw
myUser.tw.list <- mget(myUser.tw) 

# perform lapply over the sequence of positions rather than the list of elements
myUser <- lapply(seq(myUser.tw), 
       function(i){
         myUser.tw.list[i][[1]] %>% group_by(created.week) %>%
           summarise(retweet = sum(retweet), custom = sum(custom)) %>%
           ungroup() %>%
           mutate(Twitter.name = gsub("_tw$", "", names(myUser.tw.list[i])))
         })
names(myUser) <- gsub("_tw$", "", myUser.tw)

结果:名称为

的元组列表
> myUser
$MyRes
# A tibble: 2 x 4
  created.week retweet custom Twitter.name
         <dbl>   <dbl>  <dbl>        <chr>
1           30       0      0        MyRes
2           31       0      0        MyRes

$MyRes1
# A tibble: 2 x 4
  created.week retweet custom Twitter.name
         <dbl>   <dbl>  <dbl>        <chr>
1           33       0      0       MyRes1
2           34       0      0       MyRes1