使用收集反向传播

时间:2019-04-17 23:03:05

标签: r dplyr

我们有一个评分矩阵:

import re

def generate_ngrams(s, n):
    # Convert to lowercases
    s = s.lower()

    # Replace all none alphanumeric characters with spaces
    s = re.sub(r'[^a-zA-Z0-9\s]', ' ', s)

    # Break sentence in the token, remove empty tokens
    tokens = [token for token in s if token != ""]

    # Use the zip function to help us generate n-grams
    # Concatentate the tokens into ngrams and return
    ngrams = zip(*[tokens[i:] for i in range(n)])
    return ["".join(ngram) for ngram in ngrams]
print(generate_ngrams("My Dogs is sick", 2))

当我像这样传播和更改行名时:

df <- data.frame(Customer.ID=c("c1",'c1','c1','c2','c2','c3'),
             Movie.ID=c("m1", "m3", "m5", "m1", "m5", "m7"),
             Rating=c(1,2,1,3,3,1))
df
  Customer.ID Movie.ID Rating
1          c1       m1      1
2          c1       m3      2
3          c1       m5      1
4          c2       m1      3
5          c2       m5      3
6          c3       m7      1

我得到:

df1 <- df %>% spread(key = 'Movie.ID', value = 'Rating')
df1 <- data.frame(df1, row.names = 'Customer.ID')

我想再次使> df1 m1 m3 m5 m7 c1 1 2 1 NA c2 3 NA 3 NA c3 NA NA NA 1 看起来像df1

我尝试过:

df

但它返回我:

df2 <-setDT(df1, keep.rownames = TRUE)[]
df2 <- gather(df2, Video.ID, Rating, 2:4)

1 个答案:

答案 0 :(得分:0)

虽然我不确定您为什么要这样做(请参阅@Jack Brookes评论),但可以使用dplyr函数很容易地做到这一点:

df1 %>% 
  rownames_to_column('Customer.ID') %>% 
  gather(m1:m7, key = 'Movie.ID', value = 'Rating') %>% 
  filter(!is.na(Rating))

哪个给了我们

  Customer.ID Movie.ID Rating
1          c1       m1      1
2          c2       m1      3
3          c1       m3      2
4          c1       m5      1
5          c2       m5      3
6          c3       m7      1