我正在尝试创建SQL queire的wordcloud。并且在表名和列名中使用下划线是标准方法。我想在文字云中显示这些信息。但是,当前代码正在删除它,尽管我已明确写过不删除标点符号。
FILE1.TXT:
SQL_query_NEW
"SELECT 0 AS c1 , D1.c2 AS c2 , D1.c3 AS c3 , D1.c4 AS c4 , D1.c5 AS c5 , D1.c6 AS c6 , D1.c7 AS c7 , D1.c8 AS c8 , D1.c1 AS c9 FROM ( SELECT DISTINCT CASE WHEN T7267472.""PABC_DT"" > T7267432.""PEINSTL_DT"" THEN NULL ELSE T7267432.""XYZ_DT"" END AS c1 , T7267472.""ABC_DT"" AS c2 , T7267472.""SID"" AS c3 , T7267488.""CITY"" AS c4 , ( COALESCE( T7267563.""P_KEY"" , '' ) ) || '-' || ( COALESCE( T7267563.""PRD_LNG_DESC"" , '' ) ) AS c5 , T7267563.""P_KEY"" AS c6 , T7267589.""L6_DESC"" AS c7 , T7267589.""G_L3_DESC"" AS c8 FROM ""E_R_S"".""G_ADD_V"" T7267488 , ""E_R_S"".""S_G_AST_F_V"" T7267472 , ""E_R_S"".""G_G_E_S4_D1_V"" T7267589 , ""E_R_S"".""PD_MN_HR_D1_V"" T7267563 , ""E_R_S"".""S_G_AST_D_F_V"" T7267432 "
代码:
library(RODBC)
library(tm)
library(SnowballC)
library(wordcloud)
qryTxt <- read.table("C://File1.txt",sep="\t", header=TRUE)
vectorSQL = qryTxt$SQL_query_NEW
SQLCorpus <- Corpus(VectorSource(vectorSQL))
tdm <- TermDocumentMatrix(SQLCorpus,control = list(verbose = FALSE,
asPlain = FALSE,
stopwords = FALSE,
tolower = TRUE,
removeNumbers = FALSE,
stemWords = FALSE,
removePunctuation = FALSE,
removeSeparators = FALSE,
stem = FALSE,
stripWhitespace = FALSE))
matrix <- as.matrix(tdm)
v <- sort(rowSums(matrix),decreasing = TRUE)
d <- data.frame(word= names(v),freq=v)
wordcloud(d$word,v, scale = c(5,1),max.words = 10, random.order = FALSE,colors = brewer.pal(8, "Dark2"),rot.per = 0.35,use.r.layout = F)
您可以将removePunctuation视为False。仍然是在输出中删除下划线。
d
word freq
t7267472 t7267472 4
t7267563 t7267563 4
desc desc 3
t7267432 t7267432 3
t7267589 t7267589 3
ast ast 2
coalesce coalesce 2
from from 2
key key 2
select select 2
答案 0 :(得分:1)
我也面临同样的问题。我想在我的wordcloud中保留下划线。 我们可以使用下面的语料库并绘制wordcloud吗?
texts <- c("O_SALES", "M_SALES", "AMOUNT" , "TOTAL_SALES" , "COST_AMOUNT" )
corpus <- Corpus(VectorSource(texts))
wordcloud(corpus , random.order = FALSE)
答案 1 :(得分:1)
我遇到了同样的问题。列表中的控件无法修复它。
您必须使用VCorpus()
代替Corpus()
。
在您的示例中,将此SQLCorpus <- Corpus(VectorSource(vectorSQL))
更改为:
SQLCorpus <- VCorpus(VectorSource(vectorSQL))
然后,会出现下划线,破折号和任何其他标点字符。之后,您将不得不应用控件来摆脱那些您不想要的标点字符。