Question

我从Python开始，现在陷入困境。我必须从.txt中的一长串列表中仅获取关键的“文本”，例如：

{"delete":"status":"id":294512601600258048,"id_str":"294512601600258048","user_id":90681582,"user_id_str":"90681582"}, "timestamp_ms":"1410368494083"}}

和

{
    "created_at": "Wed Sep 10 17:01:33 +0000 2014",
    "id": 509748524897292288,
    "id_str": "509748524897292288",
    "text": "@Brenamae_ I WHALE SLAP YOUR FIN AND TELL YOU ONE LAST TIME: GO AWHALE",
    "source": "\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e",
    "truncated": false,
    "in_reply_to_status_id": 509748106015948800,
    "in_reply_to_status_id_str": "509748106015948800",
    "in_reply_to_user_id": 242563886,
    "in_reply_to_user_id_str": "242563886",
    "in_reply_to_screen_name": "Brenamae_",
    "user": "id": 175160659,
    "id_str": "175160659",
    "name": "Butterfly",
    "screen_name": "VanessaLilyWan",
    "location": "Canada, Montreal",
    "url": "http:\/\/instagram.com\/vanessalilywan",
    "description": "British youtubers. 'Nuff said.",
    "protected": false,
    "verified": false,
    "followers_count": 118,
    "friends_count": 180,
    "listed_count": 2,
    "favourites_count": 319,
    "statuses_count": 10221,
    "created_at": "Thu Aug 05 20:03:16 +0000 2010",
    "utc_offset": -36000,
    "time_zone": "Hawaii",
    "geo_enabled": false,
    "lang": "en",
    "contributors_enabled": false,
    "is_translator": false,
    "profile_background_color": "B2DFDA",
    "profile_background_image_url": "http:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
    "profile_background_image_url_https": "https:\/\/abs.twimg.com\/images\/themes\/theme13\/bg.gif",
    "profile_background_tile": false,
    "profile_link_color": "93A644",
    "profile_sidebar_border_color": "EEEEEE",
    "profile_sidebar_fill_color": "FFFFFF",
    "profile_text_color": "333333",
    "profile_use_background_image": true,
    "profile_image_url": "http:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
    "profile_image_url_https": "https:\/\/pbs.twimg.com\/profile_images\/470701406245376000\/2aXDrauR_normal.jpeg",
    "profile_banner_url": "https:\/\/pbs.twimg.com\/profile_banners\/175160659\/1404361640",
    "default_profile": false,
    "default_profile_image": false,
    "following": null,
    "follow_request_sent": null,
    "notifications": null
}, "geo": null, "coordinates": null, "place": null, "contributors": null, "retweet_count": 0, "favorite_count": 0, "entities": {
    "hashtags": [],
    "trends": [],
    "urls": [],
    "user_mentions": [{
        "screen_name": "Brenamae_",
        "name": "I-G-G-Bye",
        "id": 242563886,
        "id_str": "242563886",
        "indices": [0, 10]
    }],
    "symbols": ]
}, "favorited": false, "retweeted": false, "possibly_sensitive": false, "filter_level": "medium", "lang": "en", "timestamp_ms": "1410368493668"
}

所以我有这两种键，据我所知是

    import json
with open('salida_tweets.txt') as f:
    for line in f:
        texto=json.loads(line)
        objetos=texto.get('text')           
        print(objetos)

没有

@Brenamae_我最后一次拍打您的手指并告诉您一次：去捕鲸

但是在印刷品中，第一个仍然显示为'None'，我需要干净的文本才能将其与另一个文件混合。

有人可以帮我吗？

编辑：抱歉，我没有弄清楚，我需要分隔第二行中包含的“文本”行。我需要将其与包含几个单词和数字的文件混合使用。例如：

为此

“ text”：“ @Brenamae_我最后一次拍打您的手指并告诉您：加油”

我必须和它混合

SLAP -3 最后-1

获取：1.Tweet -4

所以我可以获得每个“文本”的分数。

Answer 1

当在字典中搜索不存在的键时，.get方法返回None，因此您可以从{{ 1}}。

例如

objetos

这样，如果texto.get('text')键不存在，则不会打印您的代码。

Answer 2

要从字典中完全删除键，请使用“ pop”方法。

dictionary.pop(key[, default])

如果您不关心要删除的密钥的值，并且不想在删除之前测试它是否确实存在，则可以执行以下操作：

text = dictionary.pop('text', None)

这将使变量“ text”成为字典['text']的值；如果键不存在，则将其设置为None。

但是，您的问题似乎还不清楚-听起来您不希望对象中有一个名为“文本”的键，而是对象中恰好是文本的键的值？

Python字典删除键

2 个答案: