对文件中的数据进行排序和排序

时间:2017-11-24 13:48:03

标签: python json python-3.x twitter tweepy

我开发了一个存储id列表的程序,所以:

enter image description here

但是出于理想的目的,数据应采用顺序形式,因此第一对id类似于:“889926212541448192”变为1,“889919950248448000”变为2.也就是说,文件应该是某种东西像:

enter image description here

第一个id连接2,3和6,id 4连接5,形成网络。 我没有这个领域的经验,但我找不到办法做这个阅读。 我尝试做一些程序,但它们只读取行而不是列id到id。此数据按照以下程序保存

import json

arq = open('ids.csv','w')
arq.write('Source'+','+'Target')
arq.write("\n")

lista_rede = [] #list to store all ids

with open('dados_twitter.json', 'r') as f:

    for line in f:
        lista = []

        tweet = json.loads(line) # to write as a Python dictionary
        lista = list(tweet.keys()) #write list of keys

        try:
            if 'retweeted_status' in lista:
                id_rt = json.dumps(tweet['retweeted_status']['id_str'])
                id_status = json.dumps(tweet['id_str'])

                lista_rede.append(tweet['id_str'])
                lista_rede.append(tweet['retweeted_status']['id_str'])

                arq.write( id_status +','+ id_rt )
                arq.write("\n")

            if tweet['quoted_status'] in lista :
                id_rt = json.dumps(tweet['quoted_status']['id_str'])
                id_status = json.dumps(tweet['id_str'])

                lista_rede.append(tweet['id_str'])
                lista_rede.append(tweet['quoted_status']['id_str'])

                arq.write( id_status +','+ id_rt )
                arq.write("\n")
        except:
               continue
arq.close()

因此,我有一个带有成对交互的数据的文件。 那么我怎样才能在阅读中重新排列这些数据,甚至如何编写它们?用Python或其他语言?

1 个答案:

答案 0 :(得分:0)

以下代码段可以完成这项工作 -

import re

header = ''
id_dict = {}

# read the ids
with open('ids.csv') as fr:
    header = fr.readline()
    for line in fr:
        ids = [int(s) for s in re.findall(r'\d+', line)]
        try:
            id_dict[int(ids[0])].append(int(ids[1]))
        except:
            id_dict[int(ids[0])] = [int(ids[1])]

# sort the ids
for key in id_dict:
    id_dict[key].sort()

# save the sorted ids in a new file
with open('ids_sorted.txt', 'w') as fw:
    # fw.write(header)
    for key in sorted(id_dict):
        for value in id_dict[key]:
            fw.write("{0} {1}\n".format(key, value))