我开发了一个存储id列表的程序,所以:
但是出于理想的目的,数据应采用顺序形式,因此第一对id类似于:“889926212541448192”变为1,“889919950248448000”变为2.也就是说,文件应该是某种东西像:
第一个id连接2,3和6,id 4连接5,形成网络。 我没有这个领域的经验,但我找不到办法做这个阅读。 我尝试做一些程序,但它们只读取行而不是列id到id。此数据按照以下程序保存
import json
arq = open('ids.csv','w')
arq.write('Source'+','+'Target')
arq.write("\n")
lista_rede = [] #list to store all ids
with open('dados_twitter.json', 'r') as f:
for line in f:
lista = []
tweet = json.loads(line) # to write as a Python dictionary
lista = list(tweet.keys()) #write list of keys
try:
if 'retweeted_status' in lista:
id_rt = json.dumps(tweet['retweeted_status']['id_str'])
id_status = json.dumps(tweet['id_str'])
lista_rede.append(tweet['id_str'])
lista_rede.append(tweet['retweeted_status']['id_str'])
arq.write( id_status +','+ id_rt )
arq.write("\n")
if tweet['quoted_status'] in lista :
id_rt = json.dumps(tweet['quoted_status']['id_str'])
id_status = json.dumps(tweet['id_str'])
lista_rede.append(tweet['id_str'])
lista_rede.append(tweet['quoted_status']['id_str'])
arq.write( id_status +','+ id_rt )
arq.write("\n")
except:
continue
arq.close()
因此,我有一个带有成对交互的数据的文件。 那么我怎样才能在阅读中重新排列这些数据,甚至如何编写它们?用Python或其他语言?
答案 0 :(得分:0)
以下代码段可以完成这项工作 -
import re
header = ''
id_dict = {}
# read the ids
with open('ids.csv') as fr:
header = fr.readline()
for line in fr:
ids = [int(s) for s in re.findall(r'\d+', line)]
try:
id_dict[int(ids[0])].append(int(ids[1]))
except:
id_dict[int(ids[0])] = [int(ids[1])]
# sort the ids
for key in id_dict:
id_dict[key].sort()
# save the sorted ids in a new file
with open('ids_sorted.txt', 'w') as fw:
# fw.write(header)
for key in sorted(id_dict):
for value in id_dict[key]:
fw.write("{0} {1}\n".format(key, value))