现有的其他解决方案对我不起作用。我想将csv文件与json文件进行比较,以查看json文件是否包含csv文件中的任何字符串。
我尝试过(改编自其他stackoverflow帖子)
jsoned = json.loads(x)
with open("test.csv", "wb+") as csv_file:
csv_writer = csv.writer(csv_file)
for i in jsoned:
csv_writer.writerow([i[u'tag'],
i[u'newtag']])
但它不起作用。我会更好地走另一条路线并将csv变成json吗?
修改
Json文件:
{"tag":["security architecture","systems security engineering","architecture","program protection planning (ppp)","system security engineering","security engineering"],"newtag":["security","architecture engineering & policy","certified ethical hacker","security policy and risk management","sse","enterprise transition plan","plan","tax","capacity analysis"]}
CSV:
id tag
88 systems engineering
88 project management
88 program management
88 strategic planning
88 requirements analysis
88 acquisition
88 enterprise architecture
134 java
134 software engineering
134 software development
134 xml
134 c++
134 sql
134 web services
134 javascript
134 linux
134 html
134 python
134 c
134 c#
134 software architecture
134 eclipse
134 jquery
134 oracle
134 perl
161 project management
161 systems engineering
161 requirements engineering
161 requirements management
我想看看json文件中哪个id最匹配(所以我想知道每个id有多少个标记匹配),但我不知道如何处理将json与csv进行比较
答案 0 :(得分:1)
我可能误解了你的问题,但希望这至少会让你开始。
我确定必须有更好的方法来做到这一点,但这是一种做法。
首先,加载数据,将csv数据放入嵌套列表,将json数据放入dict中。然后获取csv文件中的所有唯一ID。
浏览每个唯一ID的csv文件,并计算json标记中存在的标记数。
如果计数大于当前最大值,则将该ID存储为最佳ID。
循环完成后,您应该拥有json标记中包含最多标记的ID。
# load csv data
with open("csvdata.csv") as csvFile:
reader = csv.reader(csvFile)
loadedCSV = [row for row in reader]
# load json data and get list of tags
jsonTags = json.load("jsonFile.json")["tags"]
# create a unique list of ids from csv file
uniqueIDs = list(set([row[0] for row in loadedCSV]]))
# best match so far
selectedID = None
# keep track of best count
maxCount = 0
# go through ids
for id in uniqueIDs:
# count for specific ID
idCount = 0
# go through tags in csv and add one to count if in json tags
for row in loadedCSV:
if row[0] == id:
if row[1] in jsonTags:
idCount += 1
# compare count to current max
if idCount > maxCount:
selectedID = id