我正在尝试通过以下链接转换文件: https://ads.twitter.com/transparency
放入数据框。
这是数据的样子:
{
"archives" : [ {
"ads_account" : {
"account_name" : "@BradleyByrne - U.S. Political Campaigning",
"user_name" : "BradleyByrne",
"bio_url" : "https://twitter.com/ZpdrcK6Met",
"billing_information" : {
"insertion_order" : [ ],
"credit_card" : [ {
"city" : "Arlington",
"spend" : 3.5845999999999995E-4,
"postal_code" : "22209",
"region" : "va",
"credit_card_full_name" : "Targeted Victory"
} ]
}
},
"tweets" : [ {
"impressions" : 0,
"spend" : 0.0,
"ad_campaigns" : [ {
"targeting" : [ {
"target" : "Montgomery AL- US",
"target_type" : "GEO",
"impressions" : 895
}, {
"target" : "13-54",
"target_type" : "AGE_BUCKET",
"impressions" : 5721
}, {
"target" : "Dothan AL- US",
"target_type" : "GEO",
"impressions" : 189
}, {
"target" : "13-29",
"target_type" : "AGE_BUCKET",
"impressions" : 3009
}, {
"target" : "Chattanooga TN- US",
"target_type" : "GEO",
"impressions" : 2
}, {
"target" : "English",
"target_type" : "LANGUAGE",
"impressions" : 8568
}, {
"target" : "Orlando-Daytona Beach-Melbourne FL- US",
"target_type" : "GEO",
"impressions" : 13
}, {
"target" : "21-54",
"target_type" : "AGE_BUCKET",
"impressions" : 4297
}, {
"target" : "Thai",
"target_type" : "LANGUAGE",
"impressions" : 1
}, {
"target" : "20 and up",
"target_type" : "AGE_BUCKET",
"impressions" : 6598
},
"ads_account" : {
"account_name" : "@club4growth - U.S. Political Campaigning - Bask Digital Media",
"user_name" : "club4growth",
"bio_url" : "http://twitter.com/wEF8OWW5zn",
"billing_information" : {
"insertion_order" : [ ],
"credit_card" : [ ]
}
},
"tweets" : [ {
"impressions" : 466501,
"spend" : 2993.5,
"ad_campaigns" : [ {
"targeting" : [ {
"target" : "13 and up",
"target_type" : "AGE_BUCKET",
"impressions" : 144460
}, {
"target" : "20-34",
"target_type" : "AGE_BUCKET",
"impressions" : 78242
}, {
"target" : "Korean",
"target_type" : "LANGUAGE",
"impressions" : 160
}, {
"target" : "13-54",
"target_type" : "AGE_BUCKET",
"impressions" : 131703
}, {
"target" : "30-39",
"target_type" : "AGE_BUCKET",
"impressions" : 42685
}, {
"target" : "Pennsylvania- US",
"target_type" : "GEO",
"impressions" : 2
}, {
"target" : "25-54",
"target_type" : "AGE_BUCKET",
"impressions" : 86998
}, {
"target" : "South Dakota- US",
"target_type" : "GEO",
"impressions" : 1
}, {
"target" : "20-29",
"target_type" : "AGE_BUCKET",
"impressions" : 61090
}, {
"target" : "Dutch",
"target_type" : "LANGUAGE",
"impressions" : 41
}, {
"target" : "Unknown",
"target_type" : "GENDER",
"impressions" : 214
}, {
"target" : "Washington DC- US",
"target_type" : "GEO",
"impressions" : 144356
}, {
"target" : "French",
"target_type" : "LANGUAGE",
"impressions" : 420
}, {
"target" : "German",
"target_type" : "LANGUAGE",
"impressions" : 71
}, {
"target" : "New Jersey- US",
"target_type" : "GEO",
"impressions" : 1
}, {
"target" : "Female",
"target_type" : "GENDER",
"impressions" : 57736
},
每个广告客户似乎都有自己的嵌套字典,我没有找到将它们转换为DataFrame的方法。我尝试了以下代码对其进行转换,但只是将它们分为不同的列。
有解决方案吗? 谢谢
import json
from pandas.io.json import json_normalize
file = 'issue.txt'
with open(file) as train_file:
dict_train = json.load(train_file)
train = pd.DataFrame.from_dict(dict_train, orient='index')
train.reset_index(level=0, inplace=True)
train
答案 0 :(得分:1)
您可以使用json_normalize
进行尝试,您需要为每个json路径创建单独的数据帧,然后必须将它们合并在一起或将它们分开:
df1 = pd.json_normalize(data['archives'], record_path=['tweets'])
df2 = pd.json_normalize(data['archives'],
record_path=['ads_account', 'billing_information', 'insertion_order'],
meta=[['ads_account', 'account_name'], ['ads_account', 'user_name']])
df1
df2
输出:
df1:
impressions spend ... tweet_text tweet_url
0 132072 2071.81 ... There’s nothing controversial about something ... https://twitter.com/transparency/status/106532...
1 8779581 100000.00 ... Let’s #endgunviolencetogether - go to https://... https://twitter.com/transparency/status/106473...
2 1021063 15601.68 ... There’s nothing controversial about something ... https://twitter.com/transparency/status/106532...
3 5935913 113991.45 ... Send a postcard to your representative in less... https://twitter.com/transparency/status/106504...
4 40233 287.31 ... Care for Pennsylvania seniors is in jeopardy. ... https://twitter.com/transparency/status/113887...
... ... ... ... ... ...
2855 115744 760.68 ... Dear New York politicians: Abortion is health ... https://twitter.com/transparency/status/108388...
2856 514286 2566.19 ... In 2019, states have passed more laws than eve... https://twitter.com/transparency/status/114830...
2857 8247 180.71 ... Spread the word about Trump's real agenda so t... https://twitter.com/transparency/status/109297...
2858 4629 24.36 ... Illinois’ new law, the Reproductive Health Act... https://twitter.com/transparency/status/113485...
2859 1795 6.38 ... Congratulations to our #WebbyAwards nominated ... https://twitter.com/transparency/status/111318...
df2:
advertising_agency_name company_name ... ads_account.account_name ads_account.user_name
0 Resolution Media Toms Shoes Inc. ... @TOMS - U.S. Issue Ads - OMD TOMS
1 Precision Strategies Humana ... @humana - Issue - Precision Strategies Humana
2 NaN Federation for American Immigration Reform ... @FAIRImmigration - U.S. Issue Ads FAIRImmigration
3 NaN VH1 ... @VH1 - U.S. Issue Ads VH1
4 NaN VH1 ... @VH1 - U.S. Issue Ads VH1
.. ... ... ... ... ...
118 Cavalry LLC American Hospital Association ... @AHAAdvocacy - U.S. Issue Ads - Cavalry AHAAdvocacy
119 NaN FWD.us ... @FWDus - U.S. Issue Ads FWDus
120 NaN FWD.us ... @FWDus - U.S. Issue Ads FWDus
121 NaN California Secretary of State ... @CASOSVote - U.S. Issue Ads CASOSvote
122 NaN California Secretary of State ... @CASOSVote - U.S. Issue Ads CASOSvote
答案 1 :(得分:0)
请尝试pandas.read_json()