Python:从CSV解析Twitter时间戳

时间:2016-11-29 11:09:40

标签: python csv parsing datetime

我试图使用我自己的存档中的Python列出我的推文。我遇到的唯一问题是,如何将时间戳从字符串转换为日期时间对象。以下是我的CSV摘录:

"tweet_id","in_reply_to_status_id","in_reply_to_user_id","timestamp","source","text","retweeted_status_id","retweeted_status_user_id","retweeted_status_timestamp","expanded_urls"
"x","y","z","2016-11-27 22:14:47 +0000","<a href=""https://about.twitter.com/products/tweetdeck"" rel=""nofollow"">TweetDeck</a>","@a @b Also feel free to do so [2/2]","","","",""

这是我的代码:

#!/usr/bin/env python
# encoding: utf-8


import csv
from datetime import datetime

with open('tweets.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:

    # This works like a charm
    date_str = "2016-11-28 07:12:01 +0000"
    dt_obj = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S +0000")


    # This doesn't
    #date = datetime.strptime(row[3], "%Y-%m-%d %H:%M:%S +0000")

    # Get message
    msg = row[5]


    print("Datestring from CSV: " + row[3])
    print("Datestring from static variable: " + datetime.strftime(dt_obj, "%d.%m.%Y %H:%M:%S"))
    print(msg)

当我运行这个程序时,我会得到以下输出:

Datestring from CSV: 2016-11-27 22:14:47 +0000

Datestring from static variable: 28.11.2016 07:12:01

@a @b Also feel free to do so [2/2]

但是当我取消注意不工作部分时,我收到错误:

ValueError: time data 'timestamp' does not match format '%Y-%m-%d %H:%M:%S +0000'

为什么会这样?我无法弄清楚为什么。日期格式似乎正确,csv的时间戳中没有小L,它的类型是一个字符串,它应该工作。我没有看到我在这里失踪的东西。

谢谢!

更新

@ArunDhaJ指出要更好地使用dateutil.parser.parse()函数in this answer。如果我从解释器调用它,它的工作正常: $$ Python 2.7.12(默认,2016年11月19日,06:48:10) linux2上的[GCC 5.4.0 20160609] 输入&#34; help&#34;,&#34; copyright&#34;,&#34; credit&#34;或&#34;许可证&#34;了解更多信息。

>>> from dateutil.parser import *
>>> parse("2012-06-22 08:12:30 +0000")
datetime.datetime(2012, 6, 22, 8, 12, 30, tzinfo=tzutc())

从脚本运行会生成值错误。这是编码问题吗?

Traceback (most recent call last):
  File "./test.py", line 11, in <module>
    date = parse(row[3])
  File "/usr/lib/python2.7/dist-packages/dateutil/parser.py", line 1008, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/usr/lib/python2.7/dist-packages/dateutil/parser.py", line 395, in parse
    raise ValueError("Unknown string format")
ValueError: Unknown string format

2 个答案:

答案 0 :(得分:0)

使用以下代码进行解析

from dateutil.parser import parse
d = parse('2016-11-27 22:14:47 +0000')

答案 1 :(得分:0)

尝试跳过csv的标题 readCSV = csv.reader(csvfile, delimiter=',') , add:

readCSV.next()

next(readCSV)