如果在第一列中找到重复项,则将数据附加到CSV中的另一列

时间:2017-03-09 13:59:14

标签: bash csv

我有一个包含以下数据的CSV:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ParseEquation_test {

    /**
     * @param str
     * @param regex
     * @return
     */
    public static String coeff(String str, String regex) {
        Pattern patt = Pattern.compile(regex);
        Matcher match = patt.matcher(str);
        // missing coefficient default
        String coeff = "+0";
        double value = 0;

        if (match.find()) {
            coeff = match.group(1);
        }
        // always have sign, handle implicit 1
        value = Double.parseDouble((coeff.length() == 1) ? coeff + "1"
                : coeff);

        while (match.find()) {

            coeff = match.group(1);
            value = value + Double.parseDouble(coeff);
        }
        String value2 = String.valueOf(value);
        return (value2.length() == 1) ? (value2 + "1") : value2;
    }

    public static String[] quadParse(String arg) {
        String str = ("+" + arg).replaceAll("\\s", "");

        double a1 = Double.parseDouble(coeff(str, "([+-][0-9]*)([a-z]\\^2)"));
        double b1 = Double.parseDouble(coeff(str, "([+-][0-9]*)([a-z](?!\\^))"));
        double c1 = Double.parseDouble(coeff(str, "([+-][0-9]+)(?![a-z])"));
        System.out.println("Values are a: " + a1 + " b: " + b1 + " c: " + c1);
        if (a1 == 0) {
            if (b1 == 0) {
                if (c1 == 0) {
                    String no_sol = "There are no solution";
                    return new String[]{no_sol};
                } else {
                    String infinite_sol = "There are infinitely many solutions";
                    return new String[]{infinite_sol};
                }
            } else {
                double sol_order1 = -c1 / b1;
                String final_sol_order1 = Double.toString(sol_order1);
                return new String[]{final_sol_order1};
            }
        } else {

            double dis = (Math.pow(b1, 2.0)) - (4 * a1 * c1);
            double d = Math.sqrt(dis);
            double X = 0, Y = 0; //root 1 & root 2, respectively

            if (dis > 0.0) {
                X = (-b1 + d) / (2.0 * a1);
                Y = (-b1 - d) / (2.0 * a1);
                String root1 = Double.toString(X);
                String root2 = Double.toString(Y);
                return new String[]{root1, root2};
            } else if (dis == 0.0) {
                X = (-b1 + 0.0) / (2.0 * a1);//repeated root
                String root2 = Double.toString(X);
                return new String[]{root2};
            } else if (dis < 0) {
                String no_sol = "There are no solution";
                return new String[]{no_sol};
            }
        }
        return new String[-1];
    }

    public static void main(String[] args) throws IOException {
        // TODO code application logic here
        System.out.println("Insert equation: ");
        BufferedReader r = new BufferedReader(new InputStreamReader(System.in));
        String s;
        while ((s = r.readLine()) != null) {
            String[] pieces = quadParse(s);
            System.out.println(Arrays.toString(pieces));

        }
    }
}

我想重写CSV,以便在找到第1列中的副本时,数据会附加到第一个条目的新列中。

例如,所需的输出为:

somename1,value1
somename1,value2
somename1,value3
anothername1,anothervalue1
anothername1,anothervalue2
anothername1,anothervalue3

我如何在shell脚本中执行此操作?

TIA

2 个答案:

答案 0 :(得分:1)

使用 Awk 时,您需要的不仅仅是删除重复的行,您需要一个逻辑,如下所示为$1中的每个唯一条目创建一个元素数组。

该解决方案创建一个哈希映射,其中$1中的唯一值作为数组的索引,而元素作为附加,分隔符的值。

awk 'BEGIN{FS=OFS=","; prev="";}{ if (prev != $1) {unique[$1]=$2;} else {unique[$1]=(unique[$1]","$2)} prev=$1; }END{for (i in unique) print i,unique[i]}' file
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3

更具可读性的版本就是

BEGIN {
    # set input and output field separator to ',' and initialize 
    # variable holding last instance of $1 to empty
    FS=OFS=","
    prev=""
}
{
    # Update the value of $2 directly in the hash array only when new
    # unique elements are found in $1

    if (prev != $1){
        unique[$1]=$2
    } 
    else {
        unique[$1]=(unique[$1]","$2)
    }   

    # Update the current $1    
    prev=$1
}
END {
    for (i in unique) {
    print i,unique[i]
}

答案 1 :(得分:1)

    FILE=$1

    NAMES=`cut -d',' -f 1 $FILE | sort -u`

    for NAME in $NAMES; do
       echo -n "$NAME" 
       VALUES=`grep "$NAME" $FILE | cut -d',' -f2`
       for VAL in $VALUES; do
           echo -n ",$VAL"
       done
       echo ""
    done

使用您的数据运行生成:

>bash script.sh data1.txt
anothername1,anothervalue1,anothervalue2,anothervalue3
somename1,value1,value2,value3

您的数据的文件名必须作为参数传递。可以通过重定向将输出写入新文件。

>bash script.sh data1.txt > data_new.txt