我从Python开始,现在陷入困境。我必须从.txt中的一长串列表中仅获取关键的“文本”,例如:
{"delete":"status":"id":294512601600258048,"id_str":"294512601600258048","user_id":90681582,"user_id_str":"90681582"}, "timestamp_ms":"1410368494083"}}
和
{
"created_at": "Wed Sep 10 17:01:33 +0000 2014",
"id": 509748524897292288,
"id_str": "509748524897292288",
"text": "@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE",
"source": "\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e",
"truncated": false,
"in_reply_to_status_id": 509748106015948800,
"in_reply_to_status_id_str": "509748106015948800",
"in_reply_to_user_id": 242563886,
"in_reply_to_user_id_str": "242563886",
"in_reply_to_screen_name": "Brenamae_",
"user": "id": 175160659,
"id_str": "175160659",
"name": "Butterfly",
"screen_name": "VanessaLilyWan",
"location": "Canada, Montreal",
"url": "http:\/\/instagram.com\/vanessalilywan",
"description": "British youtubers. 'Nuff said.",
"protected": false,
"verified": false,
"followers_count": 118,
"friends_count": 180,
"listed_count": 2,
"favourites_count": 319,
"statuses_count": 10221,
"created_at": "Thu Aug 05 20:03:16 +0000 2010",
"utc_offset": -36000,
"time_zone": "Hawaii",
"geo_enabled": false,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "B2DFDA",
"profile_background_image_url": "http:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
"profile_background_image_url_https": "https:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
"profile_background_tile": false,
"profile_link_color": "93A644",
"profile_sidebar_border_color": "EEEEEE",
"profile_sidebar_fill_color": "FFFFFF",
"profile_text_color": "333333",
"profile_use_background_image": true,
"profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
"profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
"profile_banner_url": "https:\/\/pbs.twimg.com\/profile_banners\/175160659\/1404361640",
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
}, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweet_count": 0, "favorite_count": 0, "entities": {
"hashtags": [],
"trends": [],
"urls": [],
"user_mentions": [{
"screen_name": "Brenamae_",
"name": "I-G-G-Bye",
"id": 242563886,
"id_str": "242563886",
"indices": [0, 10]
}],
"symbols": ]
}, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "medium", "lang": "en", "timestamp_ms": "1410368493668"
}
所以我有这两种键,据我所知是
import json
with open('salida_tweets.txt') as f:
for line in f:
texto=json.loads(line)
objetos=texto.get('text')
print(objetos)
没有
@Brenamae_我最后一次拍打您的手指并告诉您一次:去捕鲸
但是在印刷品中,第一个仍然显示为'None',我需要干净的文本才能将其与另一个文件混合。
有人可以帮我吗?
编辑: 抱歉,我没有弄清楚,我需要分隔第二行中包含的“文本”行。我需要将其与包含几个单词和数字的文件混合使用。例如:
为此
“ text”:“ @Brenamae_我最后一次拍打您的手指并告诉您: 加油”
我必须和它混合
SLAP -3 最后-1
获取:1.Tweet -4
所以我可以获得每个“文本”的分数。
答案 0 :(得分:1)
当在字典中搜索不存在的键时,.get
方法返回None
,因此您可以从{{ 1}}。
例如
objetos
这样,如果texto.get('text')
键不存在,则不会打印您的代码。
答案 1 :(得分:0)
要从字典中完全删除键,请使用“ pop”方法。
dictionary.pop(key[, default])
如果您不关心要删除的密钥的值,并且不想在删除之前测试它是否确实存在,则可以执行以下操作:
text = dictionary.pop('text', None)
这将使变量“ text”成为字典['text']的值;如果键不存在,则将其设置为None。
但是,您的问题似乎还不清楚-听起来您不希望对象中有一个名为“文本”的键,而是对象中恰好是文本的键的值?