Question

现有的其他解决方案对我不起作用。我想将csv文件与json文件进行比较，以查看json文件是否包含csv文件中的任何字符串。

我尝试过（改编自其他stackoverflow帖子）

jsoned = json.loads(x)

with open("test.csv", "wb+") as csv_file:
    csv_writer = csv.writer(csv_file)
    for i in jsoned:
        csv_writer.writerow([i[u'tag'],
                             i[u'newtag']])

但它不起作用。我会更好地走另一条路线并将csv变成json吗？

修改

Json文件：

{"tag":["security architecture","systems security engineering","architecture","program protection planning (ppp)","system security engineering","security engineering"],"newtag":["security","architecture engineering & policy","certified ethical hacker","security policy and risk management","sse","enterprise transition plan","plan","tax","capacity analysis"]}

CSV：

id  tag
88  systems engineering
88  project management
88  program management
88  strategic planning
88  requirements analysis
88  acquisition
88  enterprise architecture
134 java
134 software engineering
134 software development
134 xml
134 c++
134 sql
134 web services
134 javascript
134 linux
134 html
134 python
134 c
134 c#
134 software architecture
134 eclipse
134 jquery
134 oracle
134 perl
161 project management
161 systems engineering
161 requirements engineering
161 requirements management

我想看看json文件中哪个id最匹配（所以我想知道每个id有多少个标记匹配），但我不知道如何处理将json与csv进行比较

Answer 1

我可能误解了你的问题，但希望这至少会让你开始。

我确定必须有更好的方法来做到这一点，但这是一种做法。

首先，加载数据，将csv数据放入嵌套列表，将json数据放入dict中。然后获取csv文件中的所有唯一ID。

浏览每个唯一ID的csv文件，并计算json标记中存在的标记数。

如果计数大于当前最大值，则将该ID存储为最佳ID。

循环完成后，您应该拥有json标记中包含最多标记的ID。

# load csv data
with open("csvdata.csv") as csvFile:
    reader = csv.reader(csvFile)
    loadedCSV = [row for row in reader]

# load json data and get list of tags
jsonTags = json.load("jsonFile.json")["tags"]

# create a unique list of ids from csv file
uniqueIDs = list(set([row[0] for row in loadedCSV]]))

# best match so far
selectedID = None

# keep track of best count
maxCount = 0

# go through ids
for id in uniqueIDs:

    # count for specific ID
    idCount = 0

    # go through tags in csv and add one to count if in json tags
    for row in loadedCSV:
        if row[0] == id:
            if row[1] in jsonTags:
                idCount += 1
    # compare count to current max
    if idCount > maxCount:
        selectedID = id

将json转换为csv文件python时出错

1 个答案: