我正在尝试将json文件转换为csv文件。 json文件来自tweepy。
import json
import csv
fo = open('Sclass.json', 'r')
fw = open('Hasil_Tweets.csv', 'a')
for line in fo:
try:
tweet = json.loads(line)
fw.write(tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],tweet['user']['statuses_count'],tweet['user']['friends_count'],tweet['user']['followers_count'],tweet['place']['bounding_box']['coordinates'],tweet['text']+"\n")
except:
continue
但是当我打印它时它起作用。
当我只写fw.write(tweet['text'])
时,它就有效了。
由于
哦,你和我的noob都没有python和tweepy。但我的直觉说,这个问题与你自己的json文件有关。对不起我的英语不好。 这是它自己的json文件:{
"created_at": "Wed Oct 11 08:36:21 +0000 2017",
"id": 918032510927355904,
"id_str": "918032510927355904",
"text": "@irfanzayo @puisisi @tasyak Lo tuh kebiasaan overthinking \ud83d\ude24",
"display_text_range": [
28,
59
],
"source": "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>",
"truncated": false,
"in_reply_to_status_id": 918032029094047746,
"in_reply_to_status_id_str": "918032029094047746",
"in_reply_to_user_id": 60049976,
"in_reply_to_user_id_str": "60049976",
"in_reply_to_screen_name": "irfanzayo",
"user": {
"id": 59980455,
"id_str": "59980455",
"name": "Mutiara Sisyanni D",
"screen_name": "MutiaraSisyanni",
"location": "Jakarta, Indonesia",
"url": "http://mutiarasyn.wixsite.com/mutiarasisyanni",
"description": null,
"translator_type": "none",
"protected": false,
"verified": false,
"followers_count": 354,
"friends_count": 237,
"listed_count": 1,
"favourites_count": 326,
"statuses_count": 6507,
"created_at": "Sat Jul 25 04:31:47 +0000 2009",
"utc_offset": 25200,
"time_zone": "Jakarta",
"geo_enabled": true,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "FA8C9E",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme5/bg.gif",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme5/bg.gif",
"profile_background_tile": false,
"profile_link_color": "FF8A94",
"profile_sidebar_border_color": "FFFFFF",
"profile_sidebar_fill_color": "99CC33",
"profile_text_color": "3E4415",
"profile_use_background_image": false,
"profile_image_url": "http://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/486497248293826560/FANdzhL9_normal.jpeg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/59980455/1404826066",
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": {
"id": "66555622726ab358",
"url": "https://api.twitter.com/1.1/geo/id/66555622726ab358.json",
"place_type": "city",
"name": "Setia Budi",
"full_name": "Setia Budi, Indonesia",
"country_code": "ID",
"country": "Indonesia",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
106.817351,
-6.24152
],
[
106.817351,
-6.201177
],
[
106.852353,
-6.201177
],
[
106.852353,
-6.24152
]
]
]
},
"attributes": {}
},
"contributors": null,
"is_quote_status": false,
"quote_count": 0,
"reply_count": 0,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": [
{
"screen_name": "irfanzayo",
"name": "irfan zayanto",
"id": 60049976,
"id_str": "60049976",
"indices": [
0,
10
]
},
{
"screen_name": "puisisi",
"name": "Puisi Pancara",
"id": 32809069,
"id_str": "32809069",
"indices": [
11,
19
]
},
{
"screen_name": "tasyak",
"name": "Tasya Kurnia",
"id": 41986880,
"id_str": "41986880",
"indices": [
20,
27
]
}
],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "in",
"timestamp_ms": "1507710981481"
}
另一个错误
回溯(最近一次呼叫最后):文件&#34; C:\ Users \ User \ Desktop \ fase 1-20170930T062552Z-001 \ transformCSV.py&#34;,第7行,in tweet = json.loads(line)File&#34; C:\ Users \ User \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ json__init __。py&#34;, 第354行,在负载中 return _default_decoder.decode(s)File&#34; C:\ Users \ User \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ json \ decoder.py&#34;, 第339行,在解码中 obj,end = self.raw_decode(s,idx = _w(s,0).end())File&#34; C:\ Users \ User \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ json \ decoder.py&#34 ;, 第357行,在raw_decode中 从无json.decoder.JSONDecodeError中提出JSONDecodeError(&#34;期望值&#34;,s,err.value):期望值:第2行第1列(char 1)
Traceback (most recent call last):
File "C:\Users\Tanabata\Desktop\Putang ina mo\spli.py", line 8, in <module>
tweet = json.load(fo)
File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\Tanabata\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 3 column 1 (char 2893)
json文件itselft:http://www.mediafire.com/file/l3rzzbe0nbu1nlu/Sclass.json
答案 0 :(得分:0)
您不能使用csv
。您必须创建writer
:
import json
import csv
with open('Sclass.json', 'r') as fo, open('Hasil_Tweets.csv', 'a') as fw:
writer = csv.writer(fw)
for line in fo:
tweet = json.loads(line)
writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],
tweet['user']['statuses_count'],tweet['user']['friends_count'],
tweet['user']['followers_count'],
tweet['place']['bounding_box']['coordinates'],tweet['text']])
对于你的第二个问题,似乎你没有json-lines-file而是一个带有单个json数据集的文件。因此,逐行阅读是错误的,您应该整体阅读该文件:
with open('Sclass.json', 'r') as fo:
tweet = json.load(fo)
with open('Hasil_Tweets.csv', 'a') as fw
writer = csv.writer(fw)
writer.writerow([tweet['id'],tweet['timestamp_ms'],tweet['user']['name'],
tweet['user']['statuses_count'],tweet['user']['friends_count'],
tweet['user']['followers_count'],
tweet['place']['bounding_box']['coordinates'],tweet['text']])
答案 1 :(得分:0)
一旦你正在使用表(csv就是一个),就想想熊猫(我的意见)。
在这种情况下,我们可以使用pandas json_normalize来解释你的json文件。
import json
from pandas.io.json import json_normalize
with open("Sclass.json.json") as f:
df = json_normalize(json.load(f))
cols = ["id","timestamp_ms","user.name",
"user.statuses_count","user.friends_count","user.followers_count",
"place.bounding_box.coordinates","text"]
df[cols].to_csv("Hasil_Tweets.csv",sep=",",index=False) # outputs to csv
Pandas有很多输出选项,其中一个是html表。我会用这个来表示问题:
print(df[cols].to_html(index=False)) # outputs to html to show result
输出
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>id</th>
<th>timestamp_ms</th>
<th>user.name</th>
<th>user.statuses_count</th>
<th>user.friends_count</th>
<th>user.followers_count</th>
<th>place.bounding_box.coordinates</th>
<th>text</th>
</tr>
</thead>
<tbody>
<tr>
<td>918032510927355904</td>
<td>1507710981481</td>
<td>Mutiara Sisyanni D</td>
<td>6507</td>
<td>237</td>
<td>354</td>
<td>[[[106.817351, -6.24152], [106.817351, -6.2011...</td>
<td>@irfanzayo @puisisi @tasyak Lo tuh kebiasaan o...</td>
</tr>
</tbody>
</table>
答案 2 :(得分:0)
我将此添加为另一个答案。
你共享的* .json实际上是一个包含多个json字符串但只有每两行的大文件。你是如何从一开始就得到这个文件我不知道但是你可以用这个来读它:
import json
import pandas as pd
with open("Sclass.json") as f:
data = [json.loads(row.strip()) for row in f.readlines()[0::2]]
但是,在将此结构读取到数据框时,您可以看到它确实没有任何明确的结构:
pd.DataFrame(data)
结论:你的问题完全是另一回事。