我们有一个评分矩阵:
import re
def generate_ngrams(s, n):
# Convert to lowercases
s = s.lower()
# Replace all none alphanumeric characters with spaces
s = re.sub(r'[^a-zA-Z0-9\s]', ' ', s)
# Break sentence in the token, remove empty tokens
tokens = [token for token in s if token != ""]
# Use the zip function to help us generate n-grams
# Concatentate the tokens into ngrams and return
ngrams = zip(*[tokens[i:] for i in range(n)])
return ["".join(ngram) for ngram in ngrams]
print(generate_ngrams("My Dogs is sick", 2))
当我像这样传播和更改行名时:
df <- data.frame(Customer.ID=c("c1",'c1','c1','c2','c2','c3'),
Movie.ID=c("m1", "m3", "m5", "m1", "m5", "m7"),
Rating=c(1,2,1,3,3,1))
df
Customer.ID Movie.ID Rating
1 c1 m1 1
2 c1 m3 2
3 c1 m5 1
4 c2 m1 3
5 c2 m5 3
6 c3 m7 1
我得到:
df1 <- df %>% spread(key = 'Movie.ID', value = 'Rating')
df1 <- data.frame(df1, row.names = 'Customer.ID')
我想再次使> df1
m1 m3 m5 m7
c1 1 2 1 NA
c2 3 NA 3 NA
c3 NA NA NA 1
看起来像df1
。
我尝试过:
df
但它返回我:
df2 <-setDT(df1, keep.rownames = TRUE)[]
df2 <- gather(df2, Video.ID, Rating, 2:4)
答案 0 :(得分:0)
虽然我不确定您为什么要这样做(请参阅@Jack Brookes评论),但可以使用dplyr
函数很容易地做到这一点:
df1 %>%
rownames_to_column('Customer.ID') %>%
gather(m1:m7, key = 'Movie.ID', value = 'Rating') %>%
filter(!is.na(Rating))
哪个给了我们
Customer.ID Movie.ID Rating
1 c1 m1 1
2 c2 m1 3
3 c1 m3 2
4 c1 m5 1
5 c2 m5 3
6 c3 m7 1