我正在尝试从MongoDB中存储的数据创建数据帧。这是我的代码:
#-- coding: utf-8 --
from pandas import
from pymongo import Connection
connection = Connection()
db = connection.tech
input_data = db.comments
data = pandas.DataFrame(list(input_data.find()),columns= ['url', 'title','comment','name'])
datac= pandas.DataFrame(data=data, columns= ["url", "title","comment","name"])
print datac
数据采用以下形式:
/* 0 */
{
"_id" : ObjectId("551d04e6ed8c780ec8f1b137"),
"url" : "http://www.theguardian.com/technology/2015/mar/31/will-the-internet-of-things-mean-the-death-of-queuing",
"comments" : {
"comment" : ["Yey, it's the stupid bloody fridge again, ordering stuff that for me that I wanted to replace before it ran out (as opposed to after), and re-ordering one off purchases that I don't need to replace. Genius idea. Makes me wonder why we haven't had these for years... for about a second.", "How dare you rant and rave about TFL!", "Silly me... They are not a recognised cult, as you were.", "To put RFID chips in product packaging, and have them read by a", "shopping cart, is ethically acceptable. It tracks products, not", "people. Clever!", "However, using a card associated with your name to pay for these", "products would expose you to surveillance. That is as foolish as", "allowing a refrigerator or oyster card report about you.", "See http://gnu.org/philosophy/surveillance-vs-democracy.html.", "To put RFID chips in product packaging, and have them read by a", "shopping cart, is ethically acceptable. It tracks the products, but", "not people. Clever!", "However, the RFIDs must be in packaging you can easily remove, so that", "they can't be used to track you after you have bought the products.", "Of course, to protect yourself from surveillance, you should always", "pay cash. Using a card associated with your name to pay for these", "products would be as foolish as letting an oyster card that you use be", "associated with your name.", "Ah yes the delivery side of things of internet shopping, the screams of my young office companion at the total inability of the delivery firms to deliver to his address a package ordered nearly a month before, finally listed as 'left in gas box' (but not his!) on the third attempt at delivery (30th perhaps contact - most of which had been one way). His talking to a robot unable to understand the phrase of 'I'd like to talk to a human' or indeed 'sentient being'. Convinced me to pop along to my local store and get similar parts to his order on the way home, that day, not nearly a month later after being turned into a gibbering idiot hating humanity and robot kind for it's utter stupidity. All that after a lecture on how his way was the future,... we're set to tear each other apart any day now using those methods. The theory is utopian, the reality is so poorly implimented as to not even make joke grading.", "Is there ever a day where you don't rant and rave about Apple?", "Do yourself a favour and change the record once in a while. Getting some fresh air (and a fresh perspective) wouldn't hurt, either.", "Well it will mean we may be able to avoid another disastrous US Import, namely Black Friday Chaos.", "Yep, you win today's \"shoehorn Apple into an article\" award.", "My plumber doesn't have a till either, he emails me receipts. Is he as totes mazeballs as Apple as well?", "My Ostercard seems to be automatically set to overcharge me.", "I do wonder how much higher the rate of overcharging (for not swiping out \"properly\") is for people that use the auto top up, compared to those that interact with the ticket machines to top up (given that with the later you can see the penalty charges)."],
"name" : ["MyCupRunnethOver", "pxr4t2", "rmstallman", "rmstallman", "JumpedUpElectrician", "wiakywbfatw", "tr1ck5t3r", "75drayton", "75drayton"]
},
"title" : "Will the Internet of Things mean the death of queuing?"
}
/* 1 */
{
"_id" : ObjectId("551d15beed8c780ec8f1b150"),
"url" : "http://www.theguardian.com/technology/2015/mar/26/periscope-review-twitter-live-streaming-service-meerkat",
"comments" : {
"comment" : ["or your data plan ..", "A tremendous idea in the hands of the right people, at the right time. However, this morning I was treated to someone having their breakfast, someone showing us their classroom, someone showing their journey to work and someone willing to show his cock (and I don't think he meant his prize Marsh Daisy) if he got enough likes.", "I've uninstalled it again.", "Would be good to flag that doing it during a performance has copyright issues and disrespects performers. And would be very annoying for those behind.", "Cheerio Meerkat?", "Would be good to know the battery hit while streaming"],
"name" : ["zerozero31", "zerozero31", "Alexander Edwards", "DavidBowiesGhost", "guardianistaleeds"]
},
"title" : "Periscope review: does Twitter's live-streaming service beat Meerkat?"
}
/* 2 */
{
"_id" : ObjectId("551d167ced8c780ec8f1b160"),
"url" : "http://www.theguardian.com/technology/2014/dec/05/sony-pictures-hack-north-korea-cyber-army",
"comments" : {
"comment" : ["So? I bet the film is terrible anyway.", "Im OK, because north korea don't see me as threat. Afterall they will not attack people that laugh at them just people that make joke about them."],
"name" : ["threegenrev", "kornetbeef"]
},
"title" : "Sony Pictures hack: how much damage can North Korea's cyber army do?"
}
我可以检索标题和网址,但我得到评论和名称值的NaN值。
任何人都可以协助我如何访问嵌套词典中的值
这是结果
title comment name
0 Will the Internet of Things mean the death of ... NaN NaN
1 Periscope review: does Twitter's live-streamin... NaN NaN
2 Sony Pictures hack: how much damage can North ... NaN NaN
3 HTC will stay out of low-end market, manager i... NaN NaN
4 Is Periscope the lunch-sharing social network ... NaN NaN
5 Obama targets foreign hackers and state-owned ... NaN NaN
6 Motorola Moto G review – the best budget smart... NaN NaN
7 iPhone 5s review: Apple shows its touch NaN NaN
8 iPhone 5S fingerprint sensor hacked by Germany... NaN NaN
9 Pacemaker launches iPad DJ app using Spotify a... NaN NaN
10 Online fraudsters 'offered services through Fa... NaN NaN
11 Apple rounds up top iOS app developers for Aid... NaN NaN
答案 0 :(得分:0)
这可能是pd.DataFrame()操作中的类型问题,因为标题条目(字符串)已正确加载,而注释和名称条目(列表)是NaN。
你能分享以下的原始输出:
list(input_data.find())
没有数据库访问我无法测试,但考虑使用:
data = json.loads(input_data.find())
接下来是:
pd.DataFrame.from_records(data)
您还可以试用Monary包