我正在从50个twitter帐户中收集关注者ID(每个帐户介于1,000和25,000之间),并且能够以类似于以下格式将这些关注者ID存储在json中:
[
36146779,
[
170742628,
3597763396,
13453212,
24763726,
19087188,
19605181,
37374972
],
22971125,
[
1114702974,
1145981566365130758,
1118409958561685504,
822439041312423941,
1110524937788424197,
807718095460581376,
24763726,
3181477874,
1076870147980300288,
307465302,
],
24763726,
[........
我要尝试找到的所有追随者ID都相同,例如一个人24763726都遵循帐户36146779和22971125。关于如何解决此问题的任何建议?我对Python和编程一般还是比较陌生,非常感谢您提供的任何帮助或建议!
到目前为止,我已经能够将保存的数据(以json格式)转换为pandas数据框,但是它不是我想要的形式。
import json
import pandas as pd
# Import the data
with open("2019_07_02_eco copy.json", "r", encoding="utf-8") as f:
data_list = json.load(f)
# Create a pandas DataFrame with the follower ids
df = pd.DataFrame(data_list)
print(df.head)
我期望的是一个pd数据框,其中的ID为(50个帐户中的)帐户ID,其行下方为列标题和关注者ID。
我得到的是这样:
[194 rows x 1 columns]
<bound method NDFrame.head of 0
0 36146779
1 [170742628, 3597763396, 247113642, 11306966070...
2 22971125
3 [1114702974, 1145981566365130758, 111840995856...
4 [295929695, 1024767030065606657, 1007033735013...
5 [984651561518252032, 982678444088541184, 98696...
6 [227843834, 23838268, 43140516, 2790255573, 33...
7 111125168
8 [1144607601914720258, 70032358, 18055487, 1127...
9 [947686805809266688, 9701692, 1096088766, 3337...
10 [2967527466, 2269464956, 249752699, 7556396244...
11 [321553655, 3546285436, 126038375, 71595951158...
12 71280747
13 [2955657113, 192354019, 1641657258, 375061682,...
14 [900344203955367937, 221726613, 1358476824, 14...
15 [2304150619, 14436400, 4833507964, 4883671481,...
16 [274049948, 2796219727, 185657334, 993542912, ...
17 72665016
18 [4892138044, 19982260, 3150202778, 73071487944...
19 [20389458, 386293346, 590031373, 576342755, 52...
20 1289611591
21 [3252647829, 16817453, 56003694, 1039493295318...
22 25088527
23 [436396700, 993251142263099392, 11435552424428...
24 [329428581, 20537025, 1724220128, 1682340361, ...
25 [15005765, 15678953, 54576200, 7521632, 121736...
26 19954039
27 [1033101308935462912, 1145323862969790464, 866...
答案 0 :(得分:1)
尝试一下:
data = [
36146779,
[
170742628,
3597763396,
13453212,
24763726,
19087188,
19605181,
37374972
],
22971125,
[
1114702974,
1145981566365130758,
1118409958561685504,
822439041312423941,
1110524937788424197,
807718095460581376,
24763726,
3181477874,
1076870147980300288,
307465302,
],
24763726,
[
1145981566365130758,
1118409958561685504,
822439041312423941,
1110524937788424197,
22971125
]
]
d = {}
for i in range(0,len(data)-1,2): # convert to dictionary
d[str(data[i])] = data[i+1]
def getKeys(dictOfElements, valueToFind):
listOfKeys = list()
listOfItems = dictOfElements.items()
for item in listOfItems:
if valueToFind in item[1]:
listOfKeys.append(item[0])
return listOfKeys
for key in d.keys():
keys = ",".join(getKeys(d, int(key)))
print ("person: {}, follows accounts: {}".format(key, keys))
输出:
person: 36146779, follows accounts:
person: 22971125, follows accounts: 24763726
person: 24763726, follows accounts: 36146779,22971125
答案 1 :(得分:0)
您的数据是一列,其中包含用户的ID,然后是连接的ID的列表,依此类推。它被转换为具有单个列的DataFrame,该列包含一个整数或一个整数列表。
尝试将您的元素一分为二,然后将这两个元素列表转换成字典,然后再将其转换为DataFrame。