将json文件与CSV相结合 - 类似于vlookup

时间:2016-09-26 14:07:16

标签: python json bash csv grep

简单来说,我试图合并两组数据。我打开使用grep / bash或python。

  1. 阅读目录/ mediaid

  2. 阅读.json文件'文件名

  3. 如果.json文件名与.csv中的行匹配,则复制该行中json文件的内容(如果没有,只需跳过)

  4. 输入数据

    File1.csv

    testentry, 1234
    testentry1, 6789
    

    INPUT DATA(文件名是要检查的MEDIAID)

    1234.json

    [
    {"id":"1", "text":"Nice man!"},
    {"id":"2", "text":"Good job"}
    ]
    

    6789.json

    [
    {"id":"1", "text":"Test1"},
    {"id":"2", "text":"Test2"}
    ]
    

    期望的输出数据.csv

    testentry, 1234, Nice man!, Good job
    testentry1, 6789, Test1, Test2
    

    我正在尝试使用GREP,但我无法检查json文件名并从中传递数据。

    #!/usr/bin/env bash
    
    indir="$HOME/indir"
    outdir="$HOME/outdir"
    
    cd "$indir" || exit
    mkdir -p "$outdir" || exit
    for f in *.csv; do
        [[ -f $f ]] || continue
        lines=()
        while IFS=, read -ra cols; do
            if (( ${#cols[@]} != 2 )); then
                echo "Sorry buddy, you'll have to use a real CSV parser to handle: $f" >&2
                exit 1
            fi
            # Does the basename match the contents of the first column?
            if [[ ${cols[0]} == "${f%.*}" ]]; then
                echo "Match found in $f"
            fi
            lines+=("${cols[0]},${cols[1]}")
        done <"$f"
        # something with JQ to read the json filename, and pass its data into the row
        printf '%s\n' "${lines[@]}" > "$outdir/$f" || exit
    done
    

    在Python中失败但尝试稍微好一些:

    import csv
    import json
    
    path_to_json = 'somedir/'
    
    json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
    
    print json_files  # 
    
    with open(json_files) as lookuplist:
        # IT NEEDS to match the mediaID from the json FILENAME
        with open('file1.csv', "r") as csvinput:
            with open('VlookupOut','w') as output:
    
                reader = csv.reader(lookuplist)
                reader2 = csv.reader(csvinput)
                writer = csv.writer(output)
    
                d = {}
                for xl in reader2:
                    d[xl[2]] = xl[3:]
    
                for i in reader:
                    if i[4] in d:
                        i.append(d[i[4]])
                    writer.writerow(i)
    

1 个答案:

答案 0 :(得分:1)

这提供了您所需的输出:

for file in /mediaid/*; do
    while read -r entry fileid; do 
        jsonfile="$fileid.json"
        if [[ -f "$jsonfile" ]]; then 
            text=$(jq -r 'map(.text) | join(", ")' "$jsonfile")
            echo "$entry $fileid, $text"
        fi
    done < "$file"
done > output.csv

使用来解析JSON文件