让我们看看我们有什么。第一个文件[接口类]:
list arrayList
list linkedList
第二档[Class countOfInstanse]:
arrayList 120
linkedList 4
我想按键[Class]加入这两个文件,并按每个接口计算:
list 124
和代码:
public class Main
{
public static void main( String[] args )
{
String docPath = args[ 0 ];
String wcPath = args[ 1 ];
String stopPath = args[ 2 ];
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, Main.class );
AppProps.setApplicationName( properties, "Part 1" );
AppProps.addApplicationTag( properties, "lets:do:it" );
AppProps.addApplicationTag( properties, "technology:Cascading" );
FlowConnector flowConnector = new Hadoop2MR1FlowConnector( properties );
// create source and sink taps
Tap docTap = new Hfs( new TextDelimited( true, "\t" ), docPath );
Tap wcTap = new Hfs( new TextDelimited( true, "\t" ), wcPath );
Fields stop = new Fields( "class" );
Tap classTap = new Hfs( new TextDelimited( true, "\t" ), stopPath );
// specify a regex operation to split the "document" text lines into a token stream
Fields token = new Fields( "token" );
Fields text = new Fields( "interface" );
RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ \\[\\]\\(\\),.]" );
Fields fieldSelector = new Fields( "interface", "class" );
Pipe docPipe = new Each( "token", text, splitter, fieldSelector );
// define "ScrubFunction" to clean up the token stream
Fields scrubArguments = new Fields( "interface", "class" );
docPipe = new Each( docPipe, scrubArguments, new ScrubFunction( scrubArguments ), Fields.RESULTS );
Fields text1 = new Fields( "amount" );
// RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ \\[\\]\\(\\),.]" );
Fields fieldSelector1 = new Fields( "class", "amount" );
Pipe stopPipe = new Each( "token1", text1, splitter, fieldSelector1 );
Pipe tokenPipe = new CoGroup( docPipe, token, stopPipe, text, new InnerJoin() );
tokenPipe = new Each( tokenPipe, text, new RegexFilter( "^$" ) );
// determine the word counts
Pipe wcPipe = new Pipe( "wc", tokenPipe );
wcPipe = new Retain( wcPipe, token );
wcPipe = new GroupBy( wcPipe, token );
wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );
// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef().setName( "wc" ).addSource( docPipe, docTap ).addSource( stopPipe, classTap ).addTailSink( wcPipe, wcTap );
// write a DOT file and run the flow
Flow wcFlow = flowConnector.connect( flowDef );
wcFlow.writeDOT( "dot/wc.dot" );
wcFlow.complete();
}
}
[我决定一步一步地解决这个问题,并在这里为其他人留下最终结果。所以第一步 - Couldn`t join two files with one key via Cascading(尚未完成)]
答案 0 :(得分:0)
我会将两个文件转换为两个Map对象,遍历键并总结数字。然后你可以把它们写回文件。
Map<String,String> nameToType = new HashMap<String,String>();
Map<String,Integer> nameToCount = new HashMap<String,Integer>();
//fill Maps from file here
Map<String,Integer> result = new HashMap<String,Integer>();
for (String name: nameToType.keyset())
{
String type = nameToType.get(name);
int count = nameToCount.get(type);
if (!result.containsKey(type))
result.put(type,0);
result.put(type, result.get(type) + count);
}