这是一张excel表,每行只填充一列。 (解释:所有CITY类别都属于V21,所有手机类别都属于CityJ等等)
V21
CITYR
CITYJ
HandsetS
HandsetHW
HandsetHA
LOWER_AGE<=20
LOWER_AGE>20
SMS_COUNT<=0
RECHARGE_MRP<=122
RECHARGE_MRP>122
SMS_COUNT>0
我需要将此格式更改为双列格式 父母和子类别格式。 因此 输出表将是
V21 CITYR
V21 CITYJ
CITYJ HandsetS
CITYJ HandsetHW
CITYJ HandsetHA
HandsetHA LOWER_AGE<=20
HandsetHA LOWER_AGE>20
LOWER_AGE>20 SMS_COUNT<=0
SMS_COUNT<=0 RECHARGE_MRP<=122
SMS_COUNT<=0 RECHARGE_MRP>122
LOWER_AGE>20 SMS_COUNT>0
数据很大,所以我不能手动完成。我该如何自动化?
答案 0 :(得分:3)
这项任务有3部分,所以我想知道你在寻求帮助的是什么。
您已经说过数据表很大,无法将其作为一个整体提取到内存中。我可以问你有多少顶级元素?即,你有多少V21?如果它只是一个,那么你有多少CITYR / CITYJ?
-
从我之前的回答中添加一些关于如何操作数据的源代码。我给它一个输入文件,它被标签分隔(4个空格等于你在excel中的一个列),下面的代码整齐地打印出来。请注意,有一个等级为== 1的条件为空。如果你认为你的JVM有太多的对象,你可以在那一点清除条目和堆栈:)
package com.ekanathk;
import java.io.BufferedReader;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Stack;
import java.util.logging.Logger;
import org.junit.Test;
class Entry {
private String input;
private int level;
public Entry(String input, int level) {
this.input = input;
this.level = level;
}
public String getInput() {
return input;
}
public int getLevel() {
return level;
}
@Override
public String toString() {
return "Entry [input=" + input + ", level=" + level + "]";
}
}
public class Tester {
private static final Logger logger = Logger.getLogger(Tester.class.getName());
@SuppressWarnings("unchecked")
@Test
public void testSomething() throws Exception {
InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("samplecsv.txt");
BufferedReader b = new BufferedReader(new InputStreamReader(is));
String input = null;
List entries = new ArrayList();
Stack<Entry> stack = new Stack<Entry>();
stack.push(new Entry("ROOT", -1));
while((input = b.readLine()) != null){
int level = whatIsTheLevel(input);
input = input.trim();
logger.info("input = " + input + " at level " + level);
Entry entry = new Entry(input, level);
if(level == 1) {
//periodically clear out the map and write it to another excel sheet
}
if (stack.peek().getLevel() == entry.getLevel()) {
stack.pop();
}
Entry parent = stack.peek();
logger.info("parent = " + parent);
entries.add(new String[]{parent.getInput(), entry.getInput()});
stack.push(entry);
}
for(Object entry : entries) {
System.out.println(Arrays.toString((String[])entry));
}
}
private int whatIsTheLevel(String input) {
int numberOfSpaces = 0;
for(int i = 0 ; i < input.length(); i++) {
if(input.charAt(i) != ' ') {
return numberOfSpaces/4;
} else {
numberOfSpaces++;
}
}
return numberOfSpaces/4;
}
}
答案 1 :(得分:1)
这认为您有一个足够小的文件以适合计算机内存。即使是10MB的文件应该是好的。
它有两部分:
DataTransformer完成所有工作 需要转换数据
TreeNode是自定义的简单树数据 结构
public class DataTransformer {
public static void main(String[] args) throws IOException {
InputStream in = DataTransformer.class
.getResourceAsStream("source_data.tab");
BufferedReader br = new BufferedReader(
new InputStreamReader(in));
String line;
TreeNode root = new TreeNode("ROOT", Integer.MIN_VALUE);
TreeNode currentNode = root;
while ((line = br.readLine()) != null) {
int level = getLevel(line);
String value = line.trim();
TreeNode nextNode = new TreeNode(value, level);
relateNextNode(currentNode, nextNode);
currentNode = nextNode;
}
printAll(root);
}
public static int getLevel(String line) {
final char TAB = '\t';
int numberOfTabs = 0;
for (int i = 0; i < line.length(); i++) {
if (line.charAt(i) != TAB) {
break;
}
numberOfTabs++;
}
return numberOfTabs;
}
public static void relateNextNode(
TreeNode currentNode, TreeNode nextNode) {
if (currentNode.getLevel() < nextNode.getLevel()) {
currentNode.addChild(nextNode);
} else {
relateNextNode(currentNode.getParent(), nextNode);
}
}
public static void printAll(TreeNode node) {
if (!node.isRoot() && !node.getParent().isRoot()) {
System.out.println(node);
}
for (TreeNode childNode : node.getChildren()) {
printAll(childNode);
}
}
}
class TreeNode implements Serializable {
private static final long serialVersionUID = 1L;
private TreeNode parent;
private List<TreeNode> children = new ArrayList<TreeNode>();
private String value;
private int level;
public TreeNode(String value, int level) {
this.value = value;
this.level = level;
}
public void addChild(TreeNode child) {
child.parent = this;
this.children.add(child);
}
public void addSibbling(TreeNode sibbling) {
TreeNode parent = this.parent;
parent.addChild(sibbling);
}
public TreeNode getParent() {
return parent;
}
public List<TreeNode> getChildren() {
return children;
}
public String getValue() {
return value;
}
public int getLevel() {
return level;
}
public boolean isRoot() {
return this.parent == null;
}
public String toString() {
String str;
if (this.parent != null) {
str = this.parent.value + '\t' + this.value;
} else {
str = this.value;
}
return str;
}
}