如何通过python 3.0解析.txt文件中的某些数据点

时间:2016-12-29 23:08:32

标签: python parsing

我有一个名为CrimeReport.txt的文件,它有这种格式的信息。

{"lang": "en", "favorited": false, "truncated": false, "text": "Active crime scene on I-59/20 near Jeff/Tusc Co line. One dead, one injured; shooting involved. Police search in the area; traffic stopped", "created_at": "Fri Jan 31 05:51:59 +0000 2014", "retweeted": false, "source": "<a href=\"http://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac</a>", "place": {"country_code": "US", "url": "https://api.twitter.com/1.1/geo/id/cf44347a08102884.json", "country": "United States", "place_type": "city", "bounding_box": {"type": "Polygon", "coordinates": [[[-86.926154, 33.267324], [-86.598948, 33.267324], [-86.598948, 33.471006], [-86.926154, 33.471006]]]}, "contained_within": [], "full_name": "Hoover, AL", "attributes": {}, "id": "cf44347a08102884", "name": "Hoover"}, "user": {"id": 15220806, "profile_sidebar_fill_color": "DDEEF6", "profile_text_color": "333333", "followers_count": 118021, "location": "Alabama", "profile_background_color": "C0DEED", "listed_count": 1705, "utc_offset": -21600, "statuses_count": 76381, "description": "Media meteorologist. WeatherBrains host. Weather geek.", "friends_count": 52014, "profile_link_color": "0084B4", "profile_image_url": "https://pbs.twimg.com/profile_images/1890149584/spannwantsyou_normal.jpg", "geo_enabled": true, "profile_banner_url": "https://pbs.twimg.com/profile_banners/15220806/1381811159", "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "screen_name": "spann", "lang": "en", "profile_background_tile": false, "favourites_count": 27, "name": "James Spann", "url": "", "created_at": "Tue Jun 24 16:02:10 +0000 2008", "time_zone": "Central Time (US & Canada)", "protected": false}, "retweet_count": 66, "id": 429129916446031872, "favorite_count": 4}

这只是CrimeReport中的一行。所有其他行与此给定行的格式相同。我的问题是如何使用Python 3.0遍历每一行并解析“文本”中的数据。

2 个答案:

答案 0 :(得分:2)

这看起来像JSON数据,所以只需逐行浏览。这与Joran的答案类似,只是我保持了一个循环,以便&#34; text&#34;每条记录都可以独立处理。

import json

with open("CrimeReport.txt") as f:
    for line in f:
        text = json.loads(line)["text"]
        ... do your work ...

答案 1 :(得分:1)

这里有一种方式

import operator,json,functools
the_text = functools.reduce(operator.add,map(operator.itemgetter("text"),map(json.loads,open(fname,"rb"))))