我目前正在查看大型文本信息数据库中的单词和短语频率(大约108MB分布在307个文本文件中)。我的目标是有一种方法可以快速查看哪些文件最相关且具有视觉吸引力的格式(尽管这个项目可能也会证明文本表示总是更清晰)。
现在我有以下内容:
SetDirectory["/MYMATHEMATICADIRECTORY/"];
filelist = FileNames[];
viewerCount1 = {0};
viewerCount2 = {0};
word1 = "freedom";
word2 = "liberty";
Do[
searchDB = StringSplit[Import[filename]];
AppendTo[viewerCount1, Count[searchDB, word1]];
AppendTo[viewerCount2, Count[searchDB, word2]];
, {filename, filelist}]
list3 = Take[viewerCount1, {2, -1}]
list4 = Take[viewerCount2, {2, -1}]
FileNames []生成一个列表,例如:{“001ABbenevolat.txt-packaged.txt”,“002abnature.txt-cleared.txt”,“003aboriginaldocs.txt-packaged.txt”,“004ABpresse.txt-cleaning” .txt“,”005acadian.txt-packaged.txt“,”006acadiedelile.txt-cleared.txt“,”007acfa.txt-cleared.txt“} [除了307条目,全部编号]。
list3生成一个列表,例如:{0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,1,0,2,0,0,0,10,1 ,7,0,0,0,0,23,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,5,0,13,0,0,0,0,0,0,0,0,0,1,0,2,0,4,0,0,0,1,11,0,2 ,0,0,2,7,1,4,1,0,0,0,0,0,0,0,13,...}等等。
命令:
BarChart3D[{list3, list4}, BarSpacing -> {0.5, 0}, ChartLayout -> "Grid"]
生成接近我想要的东西(将它们想象为文件夹粘贴)。但是,我想添加有意义的工具提示。默认情况下,它会出现频率。是否还有一种快速方法可以包括频率附加的文件名以及频率?即一个工具提示,它会调出'007acfa.txt-cleared.txt - 32',其中32个出现在文件7中?
答案 0 :(得分:6)
例如,假设您的数据类似于
list3 = RandomInteger[30, 30];
list4 = RandomInteger[30, 30];
filelist = Table["file " <> ToString[i], {i, 30}];
然后你可以做类似
的事情BarChart3D[{
MapThread[Tooltip[#2, Row[{#, " -- ", #2}]] &, {filelist, list3}],
MapThread[Tooltip[#2, Row[{#, " -- ", #2}]] &, {filelist, list4}]},
BarSpacing -> {0.5, 0}, ChartLayout -> "Grid"]
修改
另一种方法是使用LabelingFunction
:
BarChart3D[{list3, list4},
LabelingFunction ->
(Placed[Row[{filelist[[Last[#2]]], " -- ", #1}], Tooltip] &),
ChartLayout -> "Grid", BarSpacing -> {0.5, 0}]
答案 1 :(得分:3)
这应该有效:
BarChart3D[{list3, list4},
ChartLabels -> Placed[filelist, Tooltip],
ChartLayout -> "Grid",
BarSpacing -> {0.5, 0}]
修改强>
忘记你想要工具提示中的高度,你想要使用LabelingFunction
。让我们继续,也包括这个词本身:
BarChart3D[{list3, list4},
ChartLabels -> {Placed[{word1, word2}, None], Placed[filelist, None]},
ChartLayout -> "Grid",
BarSpacing -> {0.5, 0},
LabelingFunction -> (Tooltip[Row[Flatten[{#3, #1}], " - "]] &)
]