我有一个.json文件,其格式为
{"contributors": null, "truncated": false, "text": "“cool jeans,” i tell a cute boy\nlittle did he know that im talking about his genes bc those chromosomes have combined beautifully ay papi", "is_quote_status": false, "in_reply_to_status_id": null, "id": 786650297116532736, "favorite_count": 631, "source": "<a href=\"http://bufferapp.com\" rel=\"nofollow\">Buffer</a>", "retweeted": false, "coordinates": null, "entities": {"symbols": [], "user_mentions": [], "hashtags": [], "urls": []}, "in_reply_to_screen_name": null, "in_reply_to_user_id": null, "retweet_count": 233, "id_str": "786650297116532736", "favorited": false, "user": {"follow_request_sent": false, "has_extended_profile": false, "profile_use_background_image": true, "default_profile_image": false, "id": 321445166, "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "verified": false, "profile_text_color": "333333", "profile_image_url_https": "https://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "profile_sidebar_fill_color": "EFEFEF", "entities": {"url": {"urls": [{"indices": [0, 23], "expanded_url": "https://youtu.be/1qjR-p_o3BE", "display_url": "youtu.be/1qjR-p_o3BE"}]}, "description": {"urls": []}}, "followers_count": 4992012, "profile_sidebar_border_color": "FFFFFF", "id_str": "321445166", "profile_background_color": "182D66", "listed_count": 5284, "is_translation_enabled": false, "utc_offset": 7200, "statuses_count": 28488, "description": "There's a fine line between being sassy and being an asshole and I cross it everyday. Youtuber and student. Link in bio thatssarcasmposts@gmail.com", "friends_count": 55420, "location": "Cape Town", "profile_link_color": "536BA7", "profile_image_url": "http://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "following": false, "geo_enabled": false, "profile_banner_url": "https://pbs.twimg.com/profile_banners/321445166/1475082140", "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "screen_name": "ThatsSarcasm", "lang": "en", "profile_background_tile": false, "favourites_count": 3931, "name": "joke", "notifications": false, "created_at": "Tue Jun 21 15:52:41 +0000 2011", "contributors_enabled": false, "time_zone": "Pretoria", "protected": false, "default_profile": false, "is_translator": false}, "geo": null, "in_reply_to_user_id_str": null, "lang": "en", "created_at": "Thu Oct 13 19:30:20 +0000 2016", "in_reply_to_status_id_str": null, "place": null}
{"contributors": null, "truncated": false, "text": "If you're not following @relatabIe for the most relatable tweets ever, then what are you doing?I love their posts", "is_quote_status": false, "in_reply_to_status_id": null, "id": 786649227002781696, "favorite_count": 29, "source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>", "retweeted": false, "coordinates": null, "entities": {"symbols": [], "user_mentions": [{"id": 485854360, "indices": [24, 34], "id_str": "485854360", "screen_name": "relatabIe", "name": "Relatable Tweets!"}], "hashtags": [], "urls": []}, "in_reply_to_screen_name": null, "in_reply_to_user_id": null, "retweet_count": 6, "id_str": "786649227002781696", "favorited": false, "user": {"follow_request_sent": false, "has_extended_profile": false, "profile_use_background_image": true, "default_profile_image": false, "id": 321445166, "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "verified": false, "profile_text_color": "333333", "profile_image_url_https": "https://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "profile_sidebar_fill_color": "EFEFEF", "entities": {"url": {"urls": [{"indices": [0, 23], "expanded_url": "https://youtu.be/1qjR-p_o3BE", "display_url": "youtu.be/1qjR-p_o3BE"}]}, "description": {"urls": []}}, "followers_count": 4992012, "profile_sidebar_border_color": "FFFFFF", "id_str": "321445166", "profile_background_color": "182D66", "listed_count": 5284, "is_translation_enabled": false, "utc_offset": 7200, "statuses_count": 28488, "description": "There's a fine line between being sassy and being an asshole and I cross it everyday. Youtuber and student. Link in bio thatssarcasmposts@gmail.com", "friends_count": 55420, "location": "Cape Town", "profile_link_color": "536BA7", "profile_image_url": "http://pbs.twimg.com/profile_images/782124224198631424/zuUAjl5o_normal.jpg", "following": false, "geo_enabled": false, "profile_banner_url": "https://pbs.twimg.com/profile_banners/321445166/1475082140", "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/378800000024702436/c56435230dd53432a3aa7685fdd98cf7.jpeg", "screen_name": "ThatsSarcasm", "lang": "en", "profile_background_tile": false, "favourites_count": 3931, "name": "joke", "notifications": false, "created_at": "Tue Jun 21 15:52:41 +0000 2011", "contributors_enabled": false, "time_zone": "Pretoria", "protected": false, "default_profile": false, "is_translator": false}, "geo": null, "in_reply_to_user_id_str": null, "lang": "en", "created_at": "Thu Oct 13 19:26:05 +0000 2016", "in_reply_to_status_id_str": null, "place": null}
我用来转换的python代码是
import json
import csv
x=open('MyFile.json')
data=json.load(x)
x.close()
f=csv.writer(open('data.csv','wb+'))
for item in data:
f.writerow(item["text"])
f.close()
我的目标是只在csv文件中写入text属性(第3个),但它显示以下错误
File "to_csv.py", line 10, in <module>
f.writerow(item["text"])
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001f60f' in position 0: ordinal not in range(128)
我搜索并发现了一个提示。我将第10行替换为:
f.writerow(unicode(item["text"]).encode("utf-8"))
但无济于事。无法找到保持非ascii字符完整的方法。
编辑:感谢mx0&#39的解决方案,代码运行正常。我注意到的一件事是有一个换行符,因为单行文本属性以两行打印,例如检查json文件第一行中的text属性。 “男孩”之间有一个换行符。并且&#39;小&#39;。我想避免打印那些,所以我写了这段代码f = csv.writer(open('data.csv', 'wb+'))
for item in data:
if item["text"].encode('utf-8')=='\n':
f.writerow(" ")
else:
f.writerow([item["text"].encode('utf-8')])
但它似乎不起作用。有什么不对吗?
答案 0 :(得分:0)
<强>更新强> 问题没有指定python版本,第一个答案是python 3。
您的代码中存在一些问题。
您的json文件无效(或者您在此处粘贴错误)。应该像
[{"first":"object"}, {"second":"object"}]
阅读json时,必须指定编码
x = open('MyFile.json', encoding='utf-8')
写作时,请勿使用二进制模式,并指定编码。另外添加newline=''
参数或else文件将在输出中有双重换行符。
f = csv.writer(open('data.csv', 'w+', encoding='utf-8', newline=''))
函数writerow
获取一个列表并将其写为行。如果您不将其隐藏在另一个列表
I,f, ,y,o,u,',r,e, ,n,o,t
结尾
f.writerow([item["text"]])
完整的工作示例
Python 3.5
import json
import csv
x = open('MyFile.json', encoding='utf-8')
data = json.load(x)
x.close()
f = csv.writer(open('data.csv', 'w+', encoding='utf-8', newline=''))
for item in data:
f.writerow([item["text"]])
Python 2.7
对于python 2,你非常接近工作程序。在py2 csv.writerow
中以bytes
模式写入文件,但您的item["text"]
是unicode字符串,因此您必须先对其进行编码。只要您的json文件真的是unicode编码,这将有效。
import json
import csv
x = open('MyFile.json')
data = json.load(x)
x.close()
f = csv.writer(open('data.csv', 'wb+'))
for item in data:
f.writerow([item["text"].encode('utf-8')])
将换行更改为空格
f.writerow([item["text"].replace('\n', ' ').encode('utf-8')])
将换行符写为\n
f.writerow([item["text"].replace('\n', '\\n').encode('utf-8')])