我需要从一些swf文件中提取所有文本。我正在使用Java,因为我有很多用这种语言开发的模块。 因此,我通过Web搜索了所有用于处理SWF文件的免费Java库。 最后,我找到了由 StuartMacKay 开发的库。可以在GitHub上找到名为 transform-swf 的库,方法是点击here。
问题是:一旦我从GlyphIndex
中提取TextSpan
es,我该如何转换字符中的glyps?
请提供完整的工作和测试示例。不会接受任何理论答案,也不会回答“不能做”,“不可能”等等。
我所知道的以及我做了什么
我知道GlyphIndex
es是使用TextTable
构建的,DefineFont2
是通过重复表示字体大小的整数和//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));
//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
if (object instanceof DefineFont2) {
DefineFont2 df2 = (DefineFont2) object;
fonts.put(df2.getIdentifier(), df2);
}
}
//Now I retrieve all the texts
for (MovieTag object : list) {
if (object instanceof DefineText2) {
DefineText2 dt2 = (DefineText2) object;
for (TextSpan ts : dt2.getSpans()) {
Integer fontIdentifier = ts.getIdentifier();
if (fontIdentifier != null) {
int fontSize = ts.getHeight();
// Here I try to create an object that should
// reverse the process done by a TextTable
ReverseTextTable rtt =
new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
System.out.println(rtt.charactersForText(ts.getCharacters()));
}
}
}
}
对象提供的字体描述构建的,但是当我解码所有的DefineFont2,都有一个零长度提前。
以下是我的所作所为。
ReverseTextTable
课程public final class ReverseTextTable {
private final transient Map<Character, GlyphIndex> characters;
private final transient Map<GlyphIndex, Character> glyphs;
public ReverseTextTable(final DefineFont2 font, final int fontSize) {
characters = new LinkedHashMap<>();
glyphs = new LinkedHashMap<>();
final List<Integer> codes = font.getCodes();
final List<Integer> advances = font.getAdvances();
final float scale = fontSize / EMSQUARE;
final int count = codes.size();
for (int i = 0; i < count; i++) {
characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
(int) (advances.get(i) * scale)));
glyphs.put(new GlyphIndex(i,
(int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
}
}
//This method should reverse from a list of GlyphIndexes to a String
public String charactersForText(final List<GlyphIndex> list) {
String text="";
for(GlyphIndex gi: list){
text+=glyphs.get(gi);
}
return text;
}
}
如下:
DefineFont2
很遗憾,ReverseTableText
的预付款清单为空,然后ArrayIndexOutOfBoundException
的构造函数获得{{1}}。
答案 0 :(得分:1)
老实说,我不知道如何用Java做到这一点。我并没有声称这是不可能的,我也相信有办法做到这一点。但是,您说过有很多库可以做到这一点。您还建议了一个库,即swftools。因此,我建议重新访问该库以从Flash文件中提取文本。为此,您可以使用Runtime.exec()
执行命令行来运行该库。
就个人而言,我更喜欢Apache Commons exec
而不是JDK发布的标准库。好吧,让我告诉你应该怎么做。您应该使用的可执行文件是“ swfstrings.exe ”。假设它被放入“C:\
”。假设在同一个文件夹中你可以找到一个flash文件,例如page.swf
。然后,我尝试了以下代码(它工作正常):
Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
commandLine.addArgument("\"" + swfFile.toString() + "\"");
DefaultExecutor executor = new DefaultExecutor();
executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
//0 for file not found, -1 for error
ByteArrayOutputStream stdout = new ByteArrayOutputStream();
PumpStreamHandler psh = new PumpStreamHandler(stdout);
executor.setStreamHandler(psh);
int exitValue;
try{
exitValue = executor.execute(commandLine);
}catch(org.apache.commons.exec.ExecuteException ex){
psh.stop();
}
if(!executor.isFailure(exitValue)){
String out = stdout.toString("UTF-8"); // here you have the extracted text
}
我知道,这不是你要求的答案,但工作正常。
答案 1 :(得分:0)
它似乎很难实现你想要实现的目标,你试图编译文件bur我很遗憾地说它不可能,我建议你做的是将它转换成一些位图(如果可能的话)或者通过任何其他方法尝试使用OCR
读取字符有一些software's可以做到这一点,你也可以查看一些forums。因为一旦编译swf版本非常困难(据我所知,这是不可能的)。如果您愿意,可以查看此decompiler或尝试使用其他语言,例如项目here
答案 2 :(得分:0)
使用 transform-swf 库时,我遇到了类似的长字符串问题。
获取源代码并进行调试。
我相信课程com.flagstone.transform.coder.SWFDecoder
中有一个小错误。
第540行(适用于3.0.2版),更改
dest + = length;
与
dest + = count;
那应该为你做(它是关于提取字符串)。 我也通知斯图尔特。只有在字符串非常大的情况下才会出现问题。
答案 3 :(得分:0)
我现在正在尝试用Java反编译SWF,我在弄清楚如何对原始文本进行反向工程时遇到了这个问题。
在查看源代码后,我意识到它非常简单。每种字体都有一个指定的字符序列,可以通过调用DefineFont2.getCodes()
来检索,而glyphIndex是DefineFont2.getCodes()
中匹配字符的索引。
但是,如果单个SWF文件中使用了多种字体,则很难将每个DefineText
与相应的DefineFont2
匹配,因为没有标识DefineFont2
的属性用于每个DefineText
。
要解决此问题,我想出了一种自学习算法,该算法会尝试为每个DefineFont2
猜测正确的DefineText
,从而正确地推导原始文本。
为了对原始文本进行反向工程,我创建了一个名为FontLearner
的类:
public class FontLearner {
private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();
/**
* The same characters from the same font will have similar advance values.
* This constant defines the allowed difference between two advance values
* before they are treated as the same character
*/
private static final int ADVANCE_THRESHOLD = 10;
/**
* Some characters have outlier advance values despite being compared
* to the same character
* This constant defines the minimum accuracy level for each String
* before it is associated with the given font
*/
private static final double ACCURACY_THRESHOLD = 0.9;
/**
* This method adds a DefineFont2 to the learner, and a DefineText
* associated with the font to teach the learner about the given font.
*
* @param font The font to add to the learner
* @param text The text associated with the font
*/
private void addFont(DefineFont2 font, DefineText text) {
fonts.add(font);
HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
advancesMap.put(font.getIdentifier(), advances);
List<Integer> codes = font.getCodes();
List<TextSpan> spans = text.getSpans();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
advances.put(c, advance);
}
}
}
/**
*
* @param text The DefineText to retrieve the original String from
* @return The String retrieved from the given DefineText
*/
public String getString(DefineText text) {
StringBuilder sb = new StringBuilder();
List<TextSpan> spans = text.getSpans();
DefineFont2 font = null;
for (DefineFont2 getFont : fonts) {
List<Integer> codes = getFont.getCodes();
HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
if (advances == null) {
advances = new HashMap<Character, Integer>();
advancesMap.put(getFont.getIdentifier(), advances);
}
boolean notFound = true;
int totalMisses = 0;
int totalCount = 0;
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
totalCount += characters.size();
int misses = 0;
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
if (codes.size() > glyphIndex) {
char c = (char) (int) codes.get(glyphIndex);
Integer getAdvance = advances.get(c);
if (getAdvance != null) {
notFound = false;
if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
misses += 1;
}
}
} else {
notFound = false;
misses = characters.size();
break;
}
}
totalMisses += misses;
}
double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;
if (accuracy > ACCURACY_THRESHOLD && !notFound) {
font = getFont;
// teach this DefineText to the FontLearner if there are
// any new characters
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
int advance = character.getAdvance();
if (advances.get(c) == null) {
advances.put(c, advance);
}
}
}
break;
}
}
if (font != null) {
List<Integer> codes = font.getCodes();
for (TextSpan span : spans) {
List<GlyphIndex> characters = span.getCharacters();
for (GlyphIndex character : characters) {
int glyphIndex = character.getGlyphIndex();
char c = (char) (int) codes.get(glyphIndex);
sb.append(c);
}
sb = new StringBuilder(sb.toString().trim());
sb.append(" ");
}
}
return sb.toString().trim();
}
}
用法:
Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());
FontLearner learner = new FontLearner();
DefineFont2 font = null;
List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
font = (DefineFont2) object;
} else if (object instanceof DefineText) {
DefineText text = (DefineText) object;
if (font != null) {
learner.addFont(font, text);
font = null;
}
String line = learner.getString(text); // reverse engineers the line
}
我很高兴地说这种方法使用StuartMacKay的transform-swf库对原始字符串进行逆向工程的准确性达到了100%。
答案 4 :(得分:0)
我知道这不是您要的,但是最近我需要使用Java从SWF中提取文本,并且发现ffdec库比transform-swf更好
评论是否有人需要示例代码