我很抱歉提出这个问题,但我已经看了但是找不到答案。我老实说是新手。我正在尝试从json csv文件生成一个完整单词列表。我已经创建了一个行列表,但是我不能使用split()来生成包含单独单词的新列表(稍后我需要计算单词出现次数)。 我的输入文件包含twitter信息: twitter data 我试着写简单的代码:
myfile=open('fileName','r')
words=[]
for line in myfile:
words.append(line.split())
len(words)=82
我也尝试过reader = csv.reader(myFile)和reader = csv.DictReader(myFile)
但总的来说我可以得到每一行,但是如何进一步将字符串/行拆分成独立的单词。对不起,谢谢你。
我的数据#I更改为另一个示例,因为最后一个可能格式不正确:
id,flags,expiration,cas,value
493926581610364928,0,0,2635740904247446,"{""contributors"":null,""truncated"":false,""text"":""@xaaronh @blueredandgold If Namco Bandai's One Piece Unlimited World is anything to go by, no local retail release means no eShop either =\\"",""in_reply_to_status_id"":493925918998425600,""id"":493926581610364928,""favorite_count"":0,""source"":""<a href=\""hp://twitter.com\"" rel=\""nofollow\"">Twitter Web Client</a>"",""retweeted"":false,""coordinates"":null,""entities"":{""symbols"":[],""user_mentions"":[{""id"":139852376,""indices"":[0,8],""id_str"":""139852376"",""screen_name"":""xaaronh"",""name"":""Aaron""},{""id"":74393990,""indices"":[9,24],""id_str"":""74393990"",""screen_name"":""blueredandgold"",""name"":""Leigh""}],""hashtags"":[],""urls"":[]},""in_reply_to_screen_name"":""xaaronh"",""in_reply_to_user_id"":139852376,""retweet_count"":0,""id_str"":""493926581610364928"",""favorited"":false,""user"":{""follow_request_sent"":false,""profile_use_background_image"":true,""default_profile_image"":false,""id"":42302246,""profile_background_image_url_hp"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""verified"":false,""profile_text_color"":""333333"",""profile_image_url_https"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""profile_sidebar_fill_color"":""DDEEF6"",""entities"":{""description"":{""urls"":[{""url"":""hp:tttt"",""indices"":[67,89],""expanded_url"":""hp://infernalmonkey.com"",""display_url"":""infernalmonkey.com""}]}},""followers_count"":506,""profile_sidebar_border_color"":""000000"",""id_str"":""42302246"",""profile_background_color"":""1A1B1F"",""listed_count"":22,""is_translation_enabled"":false,""utc_offset"":36000,""statuses_count"":8676,""description"":""I probably tweet about video games and onaholes. Let's be friends! (NSFW)"",""friends_count"":261,""location"":""Sydney, Australia"",""profile_link_color"":""2FC2EF"",""profile_image_url"":""hp://pbs.twimg.com/profile_images/490791031487463424/udSldTQ3_normal.png"",""following"":false,""geo_enabled"":false,""profile_banner_url"":""hp://pbs.twimg.com/profile_banners/42302246/1406105444"",""profile_background_image_url"":""hp://pbs.twimg.com/profile_background_images/464279459932020736/v1xnMcrV.jpeg"",""screen_name"":""infernal_monkey"",""lang"":""en"",""profile_background_tile"":false,""favourites_count"":2018,""name"":""Lance McGill"",""notifications"":false,""url"":null,""created_at"":""Sun May 24 23:20:25 +0000 2009"",""contributors_enabled"":false,""time_zone"":""Sydney"",""protected"":false,""default_profile"":false,""is_translator"":false},""geo"":null,""in_reply_to_user_id_str"":""139852376"",""lang"":""en"",""_id"":""493926581610364928"",""created_at"":""Tue Jul 29 01:10:48 +0000 2014"",""in_reply_to_status_id_str"":""493925918998425600"",""place"":null,""metadata"":{""iso_language_code"":""en"",""result_type"":""recent""}}"
答案 0 :(得分:1)
这不是最好的解决方案,只是来自菜鸟(我)的努力,肯定需要进一步编辑以获得更好的输出。我正在使用Windows操作系统。
import csv
import json
abc=[]
myList=[]
myDict={}
myFile=open('fileName.csv','r',encoding='utf-8')
myReader=csv.reader(myFile)
header=next(myReader)
for line in myReader:
abc=json.loads(line[4])
myDict=abc
myList.append(myDict['text'])
dct={}
for eachLine in myList:
item=eachLine.split()
for one in item:
if one in dct:
dct[one]+=1
else:
dct[one]=1
finalList=list(dct.items())
finalList.sort()