我正在尝试为用户的词频生成统计信息,就像他们在
的评论中给出的那样用户1:词频
用户2:单词频率等...
我该怎么做?
我在这里尝试访问每个用户的评论,但这给我一个错误。
请提出方法和sudo代码。
import json
from pprint import pprint
file = open('/Users/mack/Downloads/WKA/task/reviews.json','r')
content = file.read()
file = json.loads(content)
for eid, txt in file["id"]["text"]:
print(eid, txt)
像这样的大json:
[
{
"id": 1,
"text": "Bought this over a month ago and everything came like advertise. I got the purple cover and it looks wonderful. The outlet works just fine and charges my kindle without a problem. I also bought it on sale so it was $20 cheaper. Best. Deal. Ever. Love my kindle paperwhite (love being able to read in the dark too!) Also makes reading at work much easier than a traditional book. Thanks Amazon.",
},
{
"id": 2,
"text": "Why three stars? Skip the next two paragraphs. Purchased the bundle on Black Friday - great price. The device works as advertised and I'm enjoying it. However, the lighting (even on max) is underwhelming. The features are handy and easy to use (i.e. dictionary, highlighting, bookmark, etc.) The case is attractive and sturdy enough, but the magnetic closure is rather weak. I suspect the case would open easily if the device were dropped.. In retrospect, I probably would have been dollars ahead to purchase a less expensive case separately rather than bundling. The reason for the three (3) stars? The promoted $15 credit towards purchase of ebook(s). After two unsuccessful attempts to redeem the credit and visiting with an Amazon rep, it appears the credit only works for Amazon digital/published books and is NOT applicable to third party publisher/sellers such as HarperCollins, Random House, Simon and Schuster, Penguin, Tyndale, Scholastic, Thomas Nelson, etc. etc. After respectfully telling the rep that this promotion seems very misleading and asking where I could find a list of authors and/or books for which the credit is applicable, he could offer no such list or database. He suggested finding an author on the Amazon ebook list, clicking on a title, putting the book into the order box and then noting the publisher in the order box. If it didn't say Amazon, I would know the credit could not be applied. I have since located several of my favorite writers and pulled up many of their ebooks. As I expected, NONE were available for purchase with the credit. ALL were published by major publishing houses. NONE were published by Amazon digital. I cannot imagine any prolific author of note not being affiliated with major publishing houses - which leaves the enticing ebook credit pretty much useless to me. The language in the Terms and Conditions seems vague at best regarding this restriction. This lack of clarity gives the consumer little, if any, pause regarding the use of the credit. After trying to use it, I felt like I had been scammed. I would NOT recommend purchasing the bundle - even on special pricing days like Black Friday. I feel like I simply gave $15 to Amazon and got virtually nothing in return. If I had it to do over again, I definitely would purchase the Paperwhite. I also would buy the Amazon charger and probably a less expensive case. (Even though I suspect a 5watt iPhone charger would work perfectly, I would still purchase the Amazon charger. In the event the device became problematic, the charger would be on the invoice thereby suggesting the device had been properly charged and disallowing refusal to repair or replace due to improper charging.) The device has been wonderful to use, the case is okay, haven't had to use the charger yet (impressive), but the $15 ebook credit seems virtually worthless.",
}
]
输入: id及其相对文本,如json
输出: id和出现在文字中的单词数
答案 0 :(得分:0)
说
file = \
[
{
"id": 1,
"text": "Bought this over a month ago and everything came like advertise. I got the purple cover and it looks wonderful. The outlet works just fine and charges my kindle without a problem. I also bought it on sale so it was $20 cheaper. Best. Deal. Ever. Love my kindle paperwhite (love being able to read in the dark too!) Also makes reading at work much easier than a traditional book. Thanks Amazon.",
},
{
"id": 2,
"text": "Why three stars? Skip the next two paragraphs. Purchased the bundle on Black Friday - great price. The device works as advertised and I'm enjoying it. However, the lighting (even on max) is underwhelming. The features are handy and easy to use (i.e. dictionary, highlighting, bookmark, etc.) The case is attractive and sturdy enough, but the magnetic closure is rather weak. I suspect the case would open easily if the device were dropped.. In retrospect, I probably would have been dollars ahead to purchase a less expensive case separately rather than bundling. The reason for the three (3) stars? The promoted $15 credit towards purchase of ebook(s). After two unsuccessful attempts to redeem the credit and visiting with an Amazon rep, it appears the credit only works for Amazon digital/published books and is NOT applicable to third party publisher/sellers such as HarperCollins, Random House, Simon and Schuster, Penguin, Tyndale, Scholastic, Thomas Nelson, etc. etc. After respectfully telling the rep that this promotion seems very misleading and asking where I could find a list of authors and/or books for which the credit is applicable, he could offer no such list or database. He suggested finding an author on the Amazon ebook list, clicking on a title, putting the book into the order box and then noting the publisher in the order box. If it didn't say Amazon, I would know the credit could not be applied. I have since located several of my favorite writers and pulled up many of their ebooks. As I expected, NONE were available for purchase with the credit. ALL were published by major publishing houses. NONE were published by Amazon digital. I cannot imagine any prolific author of note not being affiliated with major publishing houses - which leaves the enticing ebook credit pretty much useless to me. The language in the Terms and Conditions seems vague at best regarding this restriction. This lack of clarity gives the consumer little, if any, pause regarding the use of the credit. After trying to use it, I felt like I had been scammed. I would NOT recommend purchasing the bundle - even on special pricing days like Black Friday. I feel like I simply gave $15 to Amazon and got virtually nothing in return. If I had it to do over again, I definitely would purchase the Paperwhite. I also would buy the Amazon charger and probably a less expensive case. (Even though I suspect a 5watt iPhone charger would work perfectly, I would still purchase the Amazon charger. In the event the device became problematic, the charger would be on the invoice thereby suggesting the device had been properly charged and disallowing refusal to repair or replace due to improper charging.) The device has been wonderful to use, the case is okay, haven't had to use the charger yet (impressive), but the $15 ebook credit seems virtually worthless.",
}
]
count = {}
for user in file:
count[user['id']] = {}
for word in user['text'].split():
count[user['id']][word] = count[user['id']].get(word, 0) + 1
输出:
{1: {'work': 1, 'so': 1, 'like': 1, 'came': 1, 'and': 3, 'problem.': 1, 'over': 1, 'dark': 1, 'the': 2, 'just': 1, 'than': 1, 'Deal.': 1, 'being': 1, 'purple': 1, 'wonderful.': 1, 'reading': 1, 'my': 2, 'Also': 1, 'makes': 1, 'on': 1, 'Love': 1, '(love': 1, 'fine': 1, 'Ever.': 1, 'paperwhite': 1, 'Thanks': 1, 'to': 1, '$20': 1, 'bought': 1, 'book.': 1, 'at': 1, 'traditional': 1, 'read': 1, 'looks': 1, 'in': 1, 'cover': 1, 'kindle': 2, 'cheaper.': 1, 'too!)': 1, 'Best.': 1, 'works': 1, 'Amazon.': 1, 'The': 1, 'it': 3, 'easier': 1, 'this': 1, 'got': 1, 'sale': 1, 'outlet': 1, 'without': 1, 'also': 1, 'advertise.': 1, 'Bought': 1, 'much': 1, 'able': 1, 'everything': 1, 'I': 2, 'ago': 1, 'was': 1, 'a': 3, 'charges': 1, 'month': 1}, 2: {'repair': 1, 'many': 1, 'applied.': 1, 'noting': 1, 'respectfully': 1, 'expected,': 1, 'days': 1, 'several': 1, 'then': 1, 'best': 1, 'very': 1, 'being': 1, 'telling': 1, 'weak.': 1, 'clicking': 1, 'okay,': 1, 'any,': 1, 'got': 1, 'improper': 1, 'to': 12, 'trying': 1, 'use,': 1, 'if': 2, 'became': 1, 'closure': 1, 'is': 6, 'sturdy': 1, 'buy': 1, 'Nelson,': 1, 'features': 1, 'lighting': 1, 'After': 3, '(3)': 1, 'finding': 1, 'putting': 1, 'of': 7, 'unsuccessful': 1, 'say': 1, 'simply': 1, 'which': 2, 'device': 5, 'only': 1, 'attractive': 1, 'max)': 1, 'offer': 1, 'nothing': 1, 'lack': 1, 'Random': 1, 'pulled': 1, 'Paperwhite.': 1, 'this': 2, 'felt': 1, 'visiting': 1, 'appears': 1, 'publisher/sellers': 1, 'two': 2, 'ebooks.': 1, 'are': 1, 'major': 2, 'Tyndale,': 1, 'pretty': 1, 'clarity': 1, 'dollars': 1, 'Penguin,': 1, 'even': 1, 'enticing': 1, '(impressive),': 1, 'price.': 1, 'and': 13, 'over': 1, 'seems': 3, "didn't": 1, 'also': 1, 'order': 2, 'little,': 1, 'Amazon,': 1, 'reason': 1, 'have': 2, 'suggested': 1, 'digital.': 1, '(even': 1, 'redeem': 1, 'no': 1, 'pricing': 1, 'Simon': 1, 'pause': 1, 'cannot': 1, 'on': 6, 'publisher': 1, 'HarperCollins,': 1, 'yet': 1, 'Purchased': 1, 'consumer': 1, 'note': 1, 'attempts': 1, 'imagine': 1, 'box': 1, 'suspect': 2, 'case.': 1, 'an': 2, 'author': 2, 'Skip': 1, 'much': 1, 'published': 2, 'charging.)': 1, 'be': 2, 'affiliated': 1, 'list,': 1, 'expensive': 2, 'digital/published': 1, 'leaves': 1, 'purchasing': 1, 'Why': 1, 'return.': 1, 'Conditions': 1, '5watt': 1, 'vague': 1, 'title,': 1, 'This': 1, 'If': 2, 'know': 1, 'do': 1, 'favorite': 1, 'invoice': 1, 'than': 1, 'Terms': 1, 'House,': 1, 'handy': 1, 'since': 1, 'In': 2, 'up': 1, 'charged': 1, 'definitely': 1, 'purchase': 5, 'like': 3, 'replace': 1, 'rep': 1, 'wonderful': 1, 'the': 35, 'enough,': 1, 'Friday': 1, 'find': 1, 'problematic,': 1, 'been': 4, 'applicable': 1, 'probably': 2, 'bundle': 2, 'open': 1, 'credit': 7, 'However,': 1, 'could': 3, 'paragraphs.': 1, 'As': 1, 'still': 1, 'but': 2, 'restriction.': 1, 'ahead': 1, 'NONE': 2, 'gave': 1, 'charger.': 1, 'language': 1, 'advertised': 1, 'database.': 1, 'again,': 1, 'bundling.': 1, 'dropped..': 1, 'work': 1, 'houses.': 1, 'and/or': 1, 'credit.': 2, 'authors': 1, 'great': 1, 'third': 1, 'he': 1, 'by': 2, 'has': 1, 'promotion': 1, 'dictionary,': 1, 'at': 1, 'works': 2, 'book': 1, 'though': 1, 'it': 3, 'useless': 1, 'it.': 1, 'writers': 1, 'refusal': 1, 'NOT': 2, 'as': 2, 'Schuster,': 1, 'less': 2, 'would': 9, 'I': 17, 'a': 5, 'their': 1, '(i.e.': 1, 'box.': 1, 'enjoying': 1, 'Amazon': 7, '$15': 3, 'separately': 1, 'it,': 1, 'promoted': 1, 'publishing': 2, 'with': 3, "haven't": 1, 'easy': 1, 'magnetic': 1, 'retrospect,': 1, 'ebook(s).': 1, 'Black': 2, 'special': 1, 'list': 2, 'scammed.': 1, 'charger': 4, 'rather': 2, 'located': 1, 'misleading': 1, 'asking': 1, '(Even': 1, 'feel': 1, 'Scholastic,': 1, 'such': 2, 'ebook': 3, 'into': 1, 'recommend': 1, 'Friday.': 1, 'towards': 1, 'Thomas': 1, 'easily': 1, 'gives': 1, 'properly': 1, 'case': 4, 'me.': 1, 'three': 2, 'etc.': 2, 'rep,': 1, 'next': 1, 'bookmark,': 1, 'etc.)': 1, 'my': 1, 'not': 2, 'were': 4, 'in': 3, 'suggesting': 1, 'disallowing': 1, 'iPhone': 1, 'party': 1, 'any': 1, 'where': 1, 'perfectly,': 1, 'regarding': 2, 'applicable,': 1, 'underwhelming.': 1, '-': 3, 'virtually': 2, 'worthless.': 1, 'or': 2, 'had': 4, 'use': 4, 'highlighting,': 1, 'event': 1, 'He': 1, 'houses': 1, 'that': 1, 'for': 4, "I'm": 1, 'The': 7, 'available': 1, 'prolific': 1, 'stars?': 2, 'ALL': 1, 'thereby': 1, 'due': 1, 'books': 2}}
每个循环310 µs±272 ns(平均±标准偏差,共运行7次,每个循环1000次)
collections.Counter
from collections import Counter
count = {}
for user in file:
count[user['id']] = Counter()
for word in user['text'].split():
count[user['id']][word] += 1
输出:
{1: Counter({'and': 3, 'it': 3, 'a': 3, 'my': 2, 'kindle': 2, 'I': 2, 'the': 2, 'charges': 1, 'dark': 1, 'reading': 1, 'purple': 1, 'being': 1, 'works': 1, 'outlet': 1, 'read': 1, 'too!)': 1, 'like': 1, 'wonderful.': 1, 'also': 1, 'The': 1, 'much': 1, 'sale': 1, 'paperwhite': 1, 'cover': 1, 'Thanks': 1, 'Best.': 1, 'came': 1, 'Deal.': 1, 'so': 1, 'Ever.': 1, 'ago': 1, 'advertise.': 1, '$20': 1, 'Amazon.': 1, 'bought': 1, 'problem.': 1, 'cheaper.': 1, 'got': 1, 'month': 1, 'work': 1, 'makes': 1, 'just': 1, 'than': 1, 'everything': 1, 'Also': 1, 'this': 1, 'fine': 1, 'able': 1, 'to': 1, 'without': 1, 'was': 1, 'in': 1, 'book.': 1, 'at': 1, 'Bought': 1, 'Love': 1, 'on': 1, 'over': 1, 'looks': 1, '(love': 1, 'traditional': 1, 'easier': 1}), 2: Counter({'the': 35, 'I': 17, 'and': 13, 'to': 12, 'would': 9, 'Amazon': 7, 'credit': 7, 'The': 7, 'of': 7, 'on': 6, 'is': 6, 'a': 5, 'device': 5, 'purchase': 5, 'use': 4, 'been': 4, 'charger': 4, 'case': 4, 'were': 4, 'for': 4, 'had': 4, 'like': 3, 'in': 3, 'it': 3, '-': 3, '$15': 3, 'ebook': 3, 'could': 3, 'seems': 3, 'with': 3, 'After': 3, 'published': 2, 'works': 2, 'two': 2, 'by': 2, 'books': 2, 'In': 2, 'rather': 2, 'or': 2, 'such': 2, 'not': 2, 'probably': 2, 'less': 2, 'be': 2, 'major': 2, 'author': 2, 'NOT': 2, 'which': 2, 'publishing': 2, 'etc.': 2, 'expensive': 2, 'NONE': 2, 'if': 2, 'bundle': 2, 'as': 2, 'have': 2, 'credit.': 2, 'virtually': 2, 'list': 2, 'three': 2, 'Black': 2, 'this': 2, 'an': 2, 'regarding': 2, 'stars?': 2, 'order': 2, 'If': 2, 'suspect': 2, 'but': 2, 'properly': 1, 'charging.)': 1, 'dollars': 1, 'underwhelming.': 1, 'located': 1, 'dropped..': 1, 'suggesting': 1, 'return.': 1, 'much': 1, 'Conditions': 1, 'charger.': 1, 'Scholastic,': 1, 'list,': 1, 'attempts': 1, 'note': 1, 'pause': 1, 'applicable,': 1, 'repair': 1, 'replace': 1, 'and/or': 1, 'box.': 1, 'He': 1, 'invoice': 1, 'clarity': 1, 'Thomas': 1, 'title,': 1, "I'm": 1, 'it,': 1, 'enticing': 1, 'separately': 1, 'event': 1, 'pulled': 1, 'though': 1, 'Tyndale,': 1, 'several': 1, 'use,': 1, 'has': 1, 'noting': 1, 'promotion': 1, 'pretty': 1, 'suggested': 1, 'vague': 1, 'lack': 1, 'bundling.': 1, "haven't": 1, 'houses': 1, 'retrospect,': 1, 'clicking': 1, 'easy': 1, 'Amazon,': 1, 'Schuster,': 1, 'favorite': 1, 'reason': 1, 'many': 1, '(even': 1, 'applicable': 1, 'special': 1, 'iPhone': 1, 'prolific': 1, 'definitely': 1, 'my': 1, 'up': 1, 'wonderful': 1, 'are': 1, 'attractive': 1, 'case.': 1, 'it.': 1, 'redeem': 1, 'know': 1, 'digital/published': 1, 'great': 1, 'no': 1, 'any,': 1, 'As': 1, 'promoted': 1, 'respectfully': 1, 'rep': 1, 'telling': 1, 'ebooks.': 1, "didn't": 1, 'handy': 1, 'However,': 1, 'publisher/sellers': 1, 'disallowing': 1, 'price.': 1, 'perfectly,': 1, 'very': 1, 'worthless.': 1, 'into': 1, 'restriction.': 1, 'magnetic': 1, 'buy': 1, 'next': 1, 'HarperCollins,': 1, 'unsuccessful': 1, 'their': 1, 'find': 1, 'pricing': 1, 'Why': 1, 'language': 1, 'asking': 1, '(Even': 1, 'any': 1, 'imagine': 1, 'trying': 1, 'offer': 1, 'ebook(s).': 1, 'towards': 1, 'Random': 1, 'thereby': 1, 'Paperwhite.': 1, 'Simon': 1, 'third': 1, 'rep,': 1, 'Skip': 1, 'consumer': 1, 'finding': 1, 'affiliated': 1, 'cannot': 1, 'House,': 1, 'houses.': 1, 'say': 1, 'gave': 1, 'enjoying': 1, 'due': 1, 'etc.)': 1, '(impressive),': 1, 'publisher': 1, 'ALL': 1, 'became': 1, 'scammed.': 1, 'gives': 1, 'appears': 1, 'recommend': 1, 'improper': 1, 'problematic,': 1, 'Friday': 1, 'sturdy': 1, 'again,': 1, 'open': 1, 'expected,': 1, 'got': 1, 'dictionary,': 1, 'max)': 1, 'lighting': 1, 'Nelson,': 1, 'feel': 1, 'applied.': 1, 'yet': 1, 'party': 1, 'book': 1, 'enough,': 1, 'available': 1, 'purchasing': 1, 'okay,': 1, 'days': 1, 'bookmark,': 1, 'misleading': 1, 'where': 1, 'putting': 1, 'box': 1, '5watt': 1, 'Friday.': 1, 'felt': 1, 'ahead': 1, 'even': 1, 'authors': 1, 'leaves': 1, 'advertised': 1, 'easily': 1, 'visiting': 1, 'refusal': 1, 'me.': 1, 'Terms': 1, 'only': 1, 'digital.': 1, 'also': 1, 'he': 1, 'useless': 1, 'This': 1, 'still': 1, 'then': 1, 'highlighting,': 1, 'do': 1, 'features': 1, 'Purchased': 1, 'closure': 1, 'database.': 1, 'Penguin,': 1, 'work': 1, 'best': 1, 'than': 1, 'paragraphs.': 1, 'since': 1, 'being': 1, 'that': 1, 'over': 1, 'charged': 1, 'nothing': 1, 'writers': 1, '(i.e.': 1, 'weak.': 1, 'at': 1, '(3)': 1, 'simply': 1, 'little,': 1})}
每循环536 µs±858 ns(平均±标准偏差,共运行7次,每个循环1000次)