我想从数据框创建一个逻辑模型。
#''data.frame': 6532 obs. of 12 variables:
#$ NewsDesk : chr "Business" "Culture" "Business" "Business" ...
#$ SectionName : chr "Crosswords/Games" "Arts" "Business Day" "Business Day" ...
#$ SubsectionName: chr "" "" "Dealbook" "Dealbook" ...
#$ Headline : chr "More School Daze" "New 96-Page Murakami Work Coming in December" "Public Pension Funds Stay Mum on Corporate Expats" "Boot Camp for Bankers" ...
#$ Snippet : chr "A puzzle from Ethan Cooper that reminds me that a bill is due." "The Strange Library will arrive just three and a half months after Mr. Murakamis latest novel, Colorless Tsukuru Tazaki and His"| __truncated__ "Public pension funds have major stakes in American companies moving overseas to cut their tax bills. But they are saying little"| __truncated__ "As they struggle to find new business to bolster sluggish earnings, banks consider the nations 25 million veterans and service "| __truncated__ ...
#$ Abstract : chr "A puzzle from Ethan Cooper that reminds me that a bill is due." "The Strange Library will arrive just three and a half months after Mr. Murakamis latest novel, Colorless Tsukuru Tazaki and His"| __truncated__ "Public pension funds have major stakes in American companies moving overseas to cut their tax bills. But they are saying little"| __truncated__ "As they struggle to find new business to bolster sluggish earnings, banks consider the nations 25 million veterans and service "| __truncated__ ...
#$ WordCount : int 508 285 1211 1405 181 245 258 893 1077 188 ...
#$ PubDate : POSIXlt, format: "2014-09-01 22:00:09" "2014-09-01 21:14:07" ...
#$ Popular : int 1 0 0 1 1 1 0 1 1 0 ...
NewsDesk
中有11个类别。
# Business Culture Foreign Magazine Metro National OpEd Science Sports
# 1846 1548 676 375 31 198 4 521 194 2
#Styles Travel TStyle
# 297 116 724
但是,我只需要OpEd, Business, Science, Culture, TStyle
根据重要性创建模型。我不知道如何从NewsDesk
中提取这些因素?有什么想法?
答案 0 :(得分:0)
我会这样做。
set.seed(1237)
NewDesk <- sample(c("OpEd", "Business", "Science", "Culture", "TStyle", "Foreign",
"Magazine", "Metro", "Sports", "Styles", "Travel"), 100, replace = T)
df <- data.frame(Popular = sample(0:1, 100, replace = T), NewDesk = NewDesk)
filter <- c("OpEd", "Business", "Science", "Culture", "TStyle")
head(df[df$NewDesk %in% filter, ])
# Popular NewDesk
#1 0 Culture
#3 0 OpEd
#4 0 Business
#5 1 Science
#8 1 TStyle
#11 1 Business