使用Python编辑列表中的时间戳?使用函数将POSIX转换为可读格式

时间:2016-08-20 01:35:46

标签: python web-scraping timestamp list-comprehension

第二次编辑:

用于调整时区和转换格式的完整代码段。有关此解决方案的详细信息,请参阅下面的正确答案。

tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)

def timestamp_to_str(timestamp):
    return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')

timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]
adjtimesfloat = [float(i) for i in adjtimes]
dtinfofloat = [float(i) for i in dtinfo]
finishedtimes = [x for x in map(timestamp_to_str, adjtimesfloat)]
originaltimes = [x for x in map(timestamp_to_str, dtinfofloat)]

END SECOND EDIT

编辑:

此代码允许我从HTML文件中删除POSIX时间,然后将用户输入的小时数添加到原始值。负数也可以减去小时数。用户将在整个小时内工作,因为更改专门用于调整时区。

tzvar = int(input("Enter the number of hours you'd like to add to the timestamp:"))
tzvarsecs = (tzvar*3600)
print (tzvarsecs)

timestamps = soup('span', {'class': '_timestamp js-short-timestamp '})
dtinfo = [timestamp["data-time"] for timestamp in timestamps]
times = map(int, dtinfo)
adjtimes = [x+tzvarsecs for x in times]

剩下的就是与下面建议的功能相反的功能。如何使用函数将列表中的每个POSIX时间转换为可读格式?

结束编辑

下面的代码创建了一个csv文件,其中包含从保存的Twitter HTML文件中删除的数据。

Twitter将所有时间戳转换为用户在浏览器中的本地时间。我希望有一个输入选项供用户将时间戳调整一定的小时数,以便推文的数据反映高音扬声器的本地时间。

我目前正在抓取一个名为'title'的元素,该元素是每个永久链接的一部分。我可以轻松地从每条推文中抓取POSIX时间。

title="2:29 PM - 28 Sep 2015"

VS

data-time="1443475777" data-time-ms="1443475777000"

如何编辑以下部分,以便将用户输入的变量添加到每个时间戳?在请求输入时我不需要帮助,我只需要知道在将输入传递给python之后如何将它应用于时间戳列表。

timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]

与此代码/项目相关的其他问题。

Fix encoding error with loop in BeautifulSoup4?

Focusing in on specific results while scraping Twitter with Python and Beautiful Soup 4?

Using Python to Scrape Nested Divs and Spans in Twitter?

完整代码。

from bs4 import BeautifulSoup
import requests
import sys
import csv
import re
from datetime import datetime
from pytz import timezone

url = input("Enter the name of the file to be scraped:")
with open(url, encoding="utf-8") as infile:
    soup = BeautifulSoup(infile, "html.parser")

#url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
#headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
#r = requests.get(url, headers=headers)
#data = r.text.encode('utf-8')
#soup = BeautifulSoup(data, "html.parser")

names = soup('strong', {'class': 'fullname js-action-profile-name show-popup-with-id'})
usernames = [name.contents for name in names]

handles = soup('span', {'class': 'username js-action-profile-name'})
userhandles = [handle.contents[1].contents[0] for handle in handles]  
athandles = [('@')+abhandle for abhandle in userhandles]

links = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
urls = [link["href"] for link in links]
fullurls = [permalink for permalink in urls]

timestamps = soup('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
datetime = [timestamp["title"] for timestamp in timestamps]

messagetexts = soup('p', {'class': 'TweetTextSize  js-tweet-text tweet-text'}) 
messages = [messagetext for messagetext in messagetexts]  

retweets = soup('button', {'class': 'ProfileTweet-actionButtonUndo js-actionButton js-actionRetweet'})
retweetcounts = [retweet.contents[3].contents[1].contents[1].string for retweet in retweets]

favorites = soup('button', {'class': 'ProfileTweet-actionButtonUndo u-linkClean js-actionButton js-actionFavorite'})
favcounts = [favorite.contents[3].contents[1].contents[1].string for favorite in favorites]

images = soup('div', {'class': 'content'})
imagelinks = [src.contents[5].img if len(src.contents) > 5 else "No image" for src in images]

#print (usernames, "\n", "\n", athandles, "\n", "\n", fullurls, "\n", "\n", datetime, "\n", "\n",retweetcounts, "\n", "\n", favcounts, "\n", "\n", messages, "\n", "\n", imagelinks)

rows = zip(usernames,athandles,fullurls,datetime,retweetcounts,favcounts,messages,imagelinks)

rownew = list(rows)

#print (rownew)

newfile = input("Enter a filename for the table:") + ".csv"

with open(newfile, 'w', encoding='utf-8') as f:
    writer = csv.writer(f, delimiter=",")
    writer.writerow(['Usernames', 'Handles', 'Urls', 'Timestamp', 'Retweets', 'Favorites', 'Message', 'Image Link'])
    for row in rownew:
        writer.writerow(row)

3 个答案:

答案 0 :(得分:1)

使用您的代码作为示例,var datetime存储字符串日期列表。因此,让我们分三个步骤剖析这个过程,只是为了理解。

实施例

>>> datetime = [timestamp["title"] for timestamp in timestamps]
>>> print(datetime)
['2:13 AM - 29 Sep 2015', '2:29 PM - 28 Sep 2015', '8:04 AM - 28 Sep 2015']

第一步:将其转换为Python datetime object

>>> datetime_obj = datetime.strptime('2:13 AM - 29 Sep 2015', '%H:%M %p - %d %b %Y')
>>> print(datetime_obj)
datetime.datetime(2015, 9, 29, 2, 13)

第二步:将datetime对象转换为Python structured time object

>>> to_time = struct_date.timetuple()
>>> print(to_time)
time.struct_time(tm_year=2015, tm_mon=9, tm_mday=29, tm_hour=2, tm_min=13, tm_sec=0, tm_wday=1, tm_yday=272, tm_isdst=-1)

第三步:使用time将结构化时间对象转换为time.mktime

>>> timestamp = time.mktime(to_time)
>>> print(timestamp)
1443503580.0

现在一起。

import time
from datetime import datetime

...
def str_to_ts(str_date):
    return time.mktime(datetime.strptime(str_date, '%H:%M %p - %d %b %Y').timetuple())

datetimes = [timestamp["title"] for timestamp in timestamps]
times = [i for i in map(str_to_ts, datetimes)]

PS:datetime是变量名称的错误选择。特别是在这种背景下。 : - )

<强>更新

将函数应用于列表的每个值:

def add_time(timestamp, hours=0, minutes=0, seconds=0):
    return timestamp + seconds + (minutes * 60) + (hours * 60 * 60)

datetimes = [timestamp["title"] for timestamp in timestamps]
times = [add_time(i, 5, 0, 0) for i in datetimes]

更新2

将时间戳转换为字符串格式化日期:

def timestamp_to_str(timestamp):
    return datetime.fromtimestamp(timestamp).strftime('%H:%M:%S %m/%d/%Y')

示例:

>>> from time import time
>>> from datetime import datetime

>>> timestamp_to_str(time())
'17:01:47 08/29/2016'

答案 1 :(得分:0)

这就是我的想法,但不确定这是否是你所追求的:

>>> timestamps = ["1:00 PM - 28 Sep 2015", "2:00 PM - 28 Sep 2016", "3:00 PM - 29 Sep 2015"]
>>> datetime = dict(enumerate(timestamps))
>>> datetime
{0: '1:00 PM - 28 Sep 2015',
 1: '2:00 PM - 28 Sep 2016',
 2: '3:00 PM - 29 Sep 2015'}

答案 2 :(得分:0)

您似乎在寻找datetime.timedeltadocumentation here)。您可以通过各种方式将输入转换为datetime.datetime个对象,例如

timestamp = datetime.datetime.fromtimestamp(1443475777)

然后,您可以使用timedelta个对象对它们执行算术运算。 timedelta只表示时间的变化。您可以使用hours参数构造一个,如下所示:

delta = datetime.timedelta(hours=1)

然后timestamp + delta将在未来一小时给你另外datetime小时。减法也会起作用,其他任意时间间隔也是如此。