从Pandas DF中提取嵌套的Json字段

时间:2018-03-28 19:28:36

标签: python json pandas

我得到的JSON文件来自以下代码:

import jsonpickle
import tweepy
import pandas as pd


consumer_key = "xxxx" 
consumer_secret = "xxxx"
access_key = "xxxx" 
access_secret = "xxxx"

#Pass our consumer key and consumer secret to Tweepy's user authentication handler
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
#Pass our access token and access secret to Tweepy's user authentication handler
auth.set_access_token(access_key, access_secret)
#Creating a twitter API wrapper using tweepy
api = tweepy.API(auth)

#This is what we are searching for
searchQuery = '#machinelearning OR "machine learning"'  

#Maximum number of tweets we want to collect 
maxTweets = 1

#The twitter Search API allows up to 100 tweets per query
tweetsPerQry = 100

#new_tweets = api.search(q=query,count=1)
tweetCount = 0

#Open a text file to save the tweets to
with open('test_ml1.json', 'w') as f:

    #Tell the Cursor method that we want to use the Search API (api.search)
    #Also tell Cursor our query, and the maximum number of tweets to return
    for tweet in tweepy.Cursor(api.search,q=searchQuery,tweet_mode='extended').items(maxTweets) :         

            #Write the JSON format to the text file, and add one to the number of tweets we've collected
        f.write(jsonpickle.encode(tweet._json, unpicklable=False) + '\n')
        tweetCount += 1

    #Display how many tweets we have collected
        print("Downloaded {0} tweets".format(tweetCount))

我有一个JSON文件,你可以在使用Twitter API下载推文时获得。我使用下面的代码将这些推文存储在df中,因为JSON具有嵌套的词典(Loading a file with more than one line of JSON into Python's Pandas):

import pandas as pd

# read the entire file into a python array
with open('your.json', 'rb') as f:
    data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ','.join(data) + "]"

# now, load it into pandas
data_df = pd.read_json(data_json_str)

以下是我得到的数据帧到csv的代码:

data_df.to_csv("output_batch.csv",encoding = 'utf-8')

以下是csv中列的截图: enter image description here

我的问题

我想从“entities”列中提取“Screen_name”。在下面的例子中,它将是“NIRAV_88”。

0 个答案:

没有答案