在python

时间:2018-02-21 06:35:02

标签: python pandas matplotlib

我有一个json文件数据(2Go),其中每一行是(threadTitle,numberOfSubscribers)

thread1 1092
thread2 44481
thread3 12105
thread4 2835
thread5 4292
...

我想用python绘制直方图:

Y:线程数

X:5个订阅者的范围。 [0到100],[101到1000],[1001到5000],[5001到10000]和[10001和更多]

有没有办法有效地使用python中的任何库?

目前,我必须以这种方式预处理数据,然后绘制数据,但这看起来有点麻烦。

subreddit_subscriber = open("subreddit_subscriber.txt","w");

# 5 counters, one for each bin
subreddit_count = dict();
subreddit_count["below_100"] = 0;
subreddit_count["below_1000"] = 0;
subreddit_count["below_5000"]= 0;
subreddit_count["below_10000"] = 0;
subreddit_count["more_10000"] = 0;
with open(file, "r") as f:
    for line in f:
        subreddits = json.loads(line);
        subscribers = subreddits["subscribers"]
        #increment the counter according to the number of subscribers
        if(subscribers <=100):
            subreddit_count["below_100"] += 1;
        elif (subscribers <= 1000):
            subreddit_count["below_1000"] += 1;
        elif (subscribers <= 5000):
            subreddit_count["below_5000"] += 1;
        elif (subscribers <= 10000):
            subreddit_count["below_10000"] += 1;
        else:
            subreddit_count["more_10000"] += 1;

for key in subreddit_count.keys():
    subreddit_subscriber.write(key + "\t" + str(subreddit_count[key]));
subreddit_subscriber.close();

**原始数据集的第一行是这一行:**

{"header_img":"...","submit_link_label":"Submit a new post","name":"t5_2qgzg","description":"/r/business brings you the best of your business section. From tips for running a business, to pitfalls to avoid, /r/business teaches you the smart moves and helps you dodge the foolish.\n\n/r/business is not the place for stories about the government's economic policies or corporate corruption. \n#### Rules:\n\n1. **This is not the place to promote your business.**  \nAny and every post promoting a business in any capacity will be removed.  \n  \n6. **Spamming will result in an instant ban.**  \nNo mercy for spammers.   \n  \n2. By posting here, __you agree that you have no connections to the site of the articles you submit__, If you do, we will instantly ban you.   \n\n2. We do not allow __['blogspam'](http://www.urbandictionary.com/define.php?term=blogspam)__, any post that looks like blogspam will be removed.\n\n2. Political submission are not allowed and will be removed. Use /r/politics.\n\n3. Examples of Corporations behaving badly? That goes in /r/greed.  \n\n4. This place should have a \"business casual\" feel. Like you're at a networking party...**make jokes, not offensive comments.**  \n\n5. Please follow [Reddiquette.](http://www.reddit.com/wiki/reddiquette)\n\n6. If you link directly to video content you need to leave a comment, or state\nclearly in the title, what the video is about\n\n*Helpful Subreddits:*\n\n* /r/Accounting\n\n* /r/Banking\n\n* /r/BusinessHub\n\n* /r/BusinessInsiders\n\n* /r/BusinessSchool\n\n* /r/Consulting\n\n* /r/Corruption\n\n* /r/Economics\n\n* /r/Economy\n\n* /r/Finance\n\n* /r/InternationalBusiness\n\n* /r/Investing\n\n* /r/InvestmentClub\n\n* /r/MBA\n\n* /r/RealEstate\n\n* /r/Sales\n\n\n\n*Supply chain and logistics:*\n\n* /r/mailroom\n\n*Small Businesses:*\n\n* /r/Entrepreneur \n\n* /r/SmallBusiness \n\n* /r/StartUps \n  \n  \n^[Photography](http://www.flickr.com/photos/reactionphotography) ^of ^the ^header ^licensed ^under ^[CC](http://creativecommons.org/licenses/by-sa/2.0/deed.en)","suggested_comment_sort":null,**"subscribers":201926**,"header_title":"/r/business brings you the best of your business section.","header_size":[1,1],"public_traffic":false,"description_html":"&lt;!-- SC_OFF --&gt;&lt;div class=\"md\"&gt;&lt;p&gt;&lt;a href=\"/r/business\"&gt;/r/business&lt;/a&gt; brings you the best of your business section. From tips for running a business, to pitfalls to avoid, &lt;a href=\"/r/business\"&gt;/r/business&lt;/a&gt; teaches you the smart moves and helps you dodge the foolish.&lt;/p&gt;\n\n&lt;p&gt;&lt;a href=\"/r/business\"&gt;/r/business&lt;/a&gt; is not the place for stories about the government&amp;#39;s economic policies or corporate corruption. &lt;/p&gt;\n\n&lt;h4&gt;Rules:&lt;/h4&gt;\n\n&lt;ol&gt;\n&lt;li&gt;&lt;p&gt;&lt;strong&gt;This is not the place to promote your business.&lt;/strong&gt;&lt;br/&gt;\nAny and every post promoting a business in any capacity will be removed.  &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;strong&gt;Spamming will result in an instant ban.&lt;/strong&gt;&lt;br/&gt;\nNo mercy for spammers.   &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;By posting here, &lt;strong&gt;you agree that you have no connections to the site of the articles you submit&lt;/strong&gt;, If you do, we will instantly ban you.   &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;We do not allow &lt;strong&gt;&lt;a href=\"http://www.urbandictionary.com/define.php?term=blogspam\"&gt;&amp;#39;blogspam&amp;#39;&lt;/a&gt;&lt;/strong&gt;, any post that looks like blogspam will be removed.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Political submission are not allowed and will be removed. Use &lt;a href=\"/r/politics\"&gt;/r/politics&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Examples of Corporations behaving badly? That goes in &lt;a href=\"/r/greed\"&gt;/r/greed&lt;/a&gt;.  &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;This place should have a &amp;quot;business casual&amp;quot; feel. Like you&amp;#39;re at a networking party...&lt;strong&gt;make jokes, not offensive comments.&lt;/strong&gt;  &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Please follow &lt;a href=\"http://www.reddit.com/wiki/reddiquette\"&gt;Reddiquette.&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;If you link directly to video content you need to leave a comment, or state\nclearly in the title, what the video is about&lt;/p&gt;&lt;/li&gt;\n&lt;/ol&gt;\n\n&lt;p&gt;&lt;em&gt;Helpful Subreddits:&lt;/em&gt;&lt;/p&gt;\n\n&lt;ul&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Accounting\"&gt;/r/Accounting&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Banking\"&gt;/r/Banking&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/BusinessHub\"&gt;/r/BusinessHub&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/BusinessInsiders\"&gt;/r/BusinessInsiders&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/BusinessSchool\"&gt;/r/BusinessSchool&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Consulting\"&gt;/r/Consulting&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Corruption\"&gt;/r/Corruption&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Economics\"&gt;/r/Economics&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Economy\"&gt;/r/Economy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Finance\"&gt;/r/Finance&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/InternationalBusiness\"&gt;/r/InternationalBusiness&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Investing\"&gt;/r/Investing&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/InvestmentClub\"&gt;/r/InvestmentClub&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/MBA\"&gt;/r/MBA&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/RealEstate\"&gt;/r/RealEstate&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Sales\"&gt;/r/Sales&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;\n&lt;/ul&gt;\n\n&lt;p&gt;&lt;em&gt;Supply chain and logistics:&lt;/em&gt;&lt;/p&gt;\n\n&lt;ul&gt;\n&lt;li&gt;&lt;a href=\"/r/mailroom\"&gt;/r/mailroom&lt;/a&gt;&lt;/li&gt;\n&lt;/ul&gt;\n\n&lt;p&gt;&lt;em&gt;Small Businesses:&lt;/em&gt;&lt;/p&gt;\n\n&lt;ul&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/Entrepreneur\"&gt;/r/Entrepreneur&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/SmallBusiness\"&gt;/r/SmallBusiness&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;&lt;a href=\"/r/StartUps\"&gt;/r/StartUps&lt;/a&gt; &lt;/p&gt;&lt;/li&gt;\n&lt;/ul&gt;\n\n&lt;p&gt;&lt;sup&gt;&lt;a href=\"http://www.flickr.com/photos/reactionphotography\"&gt;Photography&lt;/a&gt;&lt;/sup&gt; &lt;sup&gt;of&lt;/sup&gt; &lt;sup&gt;the&lt;/sup&gt; &lt;sup&gt;header&lt;/sup&gt; &lt;sup&gt;licensed&lt;/sup&gt; &lt;sup&gt;under&lt;/sup&gt; &lt;sup&gt;&lt;a href=\"http://creativecommons.org/licenses/by-sa/2.0/deed.en\"&gt;CC&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;","hide_ads":false,"icon_img":"","public_description":"/r/business brings you the best of your business section. From tips for running a business, to pitfalls to avoid, /r/business teaches you the smart moves and helps you dodge the foolish.","created_utc":1190054517,"submit_text":"Please remember our sub rules:\n\n1. This is not the place to promote\nyour business.\nAny and every post promoting a\nbusiness in any capacity will be\nremoved.\n\n\n2. Spamming will result in an instant\nban.\nNo mercy for spammers.\n\n\n3. By posting here, you agree that you\nhave no connections to the site of\nthe articles you submit, If you do,\nwe will instantly ban you.\n\n\n4. We do not allow 'blogspam' , any\npost that looks like blogspam will be\nremoved.\n\n\n5. Political submission are not allowed\nand will be removed. Use /r/politics .\n\n\n6. Examples of Corporations behaving\nbadly? That goes in /r/greed .\n\n\n7. This place should have a \"business\ncasual\" feel. Like you're at a\nnetworking party... make jokes, not\noffensive comments.\n\n\n8. Please follow Reddiquette.","title":"business","subreddit_type":"public","url":"/r/business/","wiki_enabled":false,"submission_type":"any","public_description_html":"&lt;!-- SC_OFF --&gt;&lt;div class=\"md\"&gt;&lt;p&gt;&lt;a href=\"/r/business\"&gt;/r/business&lt;/a&gt; brings you the best of your business section. From tips for running a business, to pitfalls to avoid, &lt;a href=\"/r/business\"&gt;/r/business&lt;/a&gt; teaches you the smart moves and helps you dodge the foolish.&lt;/p&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;","banner_size":null,"accounts_active":null,"lang":"en","key_color":"","id":"2qgzg","icon_size":null,"submit_text_label":null,"submit_text_html":"&lt;!-- SC_OFF --&gt;&lt;div class=\"md\"&gt;&lt;p&gt;Please remember our sub rules:&lt;/p&gt;\n\n&lt;ol&gt;\n&lt;li&gt;&lt;p&gt;This is not the place to promote\nyour business.\nAny and every post promoting a\nbusiness in any capacity will be\nremoved.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Spamming will result in an instant\nban.\nNo mercy for spammers.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;By posting here, you agree that you\nhave no connections to the site of\nthe articles you submit, If you do,\nwe will instantly ban you.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;We do not allow &amp;#39;blogspam&amp;#39; , any\npost that looks like blogspam will be\nremoved.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Political submission are not allowed\nand will be removed. Use &lt;a href=\"/r/politics\"&gt;/r/politics&lt;/a&gt; .&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Examples of Corporations behaving\nbadly? That goes in &lt;a href=\"/r/greed\"&gt;/r/greed&lt;/a&gt; .&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;This place should have a &amp;quot;business\ncasual&amp;quot; feel. Like you&amp;#39;re at a\nnetworking party... make jokes, not\noffensive comments.&lt;/p&gt;&lt;/li&gt;\n&lt;li&gt;&lt;p&gt;Please follow Reddiquette.&lt;/p&gt;&lt;/li&gt;\n&lt;/ol&gt;\n&lt;/div&gt;&lt;!-- SC_ON --&gt;","comment_score_hide_mins":0,"quarantine":false,**"display_name":"business"**,"collapse_deleted_comments":false,"banner_img":"","over18":false}

**编辑:** After plotting Jezrael's code

1 个答案:

答案 0 :(得分:0)

我认为您只能阅读要列出的值subscribers,然后创建Series并使用cut进行分级,value_counts进行计数,最后plot.bar

L = []
with open(file, "r") as f:
    for line in f:
        L.append(json.loads(line)["subscribers"])
print (L)

s = pd.Series(L)
#print (s)


l = ['below_100','below_1000','below_5000','below_10000','more_10000']
a = pd.cut(s, bins=[-1,100,1000,5000,10000,np.inf], labels=l).value_counts()
#print (a)


a.plot.bar()