link of input .txt file The code searches for the Lines starting with "From " and then splits the line into words and adds 6th subwords (i.e the hrs part from hr:min:sec)
fhand=open("mbox-short.txt")
words=list()
for line in fhand:
if line.startswith("From "):
word=line.split()
words=word.append(word[6])
print(words)
答案 0 :(得分:1)
I think, this is what you wanted. You were appending in the word, which was initialized inside the loop and its value changes in every iteration.
fhand=open("/home/user/Downloads/mbox-short.txt")
words=list()
for line in fhand:
if line.startswith("From "):
word=line.split()
word.append(word[6])
words.append(word)
print(words)
It prints:
[['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008', '2008'], ['From', 'louis@media.berkeley.edu', 'Fri', 'Jan', '4', '18:10:48', '2008', '2008'], ['From', 'zqian@umich.edu', 'Fri', 'Jan', '4', '16:10:39', '2008', '2008'], ['From', 'rjlowe@iupui.edu', 'Fri', 'Jan', '4', '15:46:24', '2008', '2008'], ['From', 'zqian@umich.edu', 'Fri', 'Jan', '4', '15:03:18', '2008', '2008'], ['From', 'rjlowe@iupui.edu', 'Fri', 'Jan', '4', '14:50:18', '2008', '2008'], ['From', 'cwen@iupui.edu', 'Fri', 'Jan', '4', '11:37:30', '2008', '2008'], ['From', 'cwen@iupui.edu', 'Fri', 'Jan', '4', '11:35:08', '2008', '2008'], ['From', 'gsilver@umich.edu', 'Fri', 'Jan', '4', '11:12:37', '2008', '2008'], ['From', 'gsilver@umich.edu', 'Fri', 'Jan', '4', '11:11:52', '2008', '2008'], ['From', 'zqian@umich.edu', 'Fri', 'Jan', '4', '11:11:03', '2008', '2008'], ['From', 'gsilver@umich.edu', 'Fri', 'Jan', '4', '11:10:22', '2008', '2008'], ['From', 'wagnermr@iupui.edu', 'Fri', 'Jan', '4', '10:38:42', '2008', '2008'], ['From', 'zqian@umich.edu', 'Fri', 'Jan', '4', '10:17:43', '2008', '2008'], ['From', 'antranig@caret.cam.ac.uk', 'Fri', 'Jan', '4', '10:04:14', '2008', '2008'], ['From', 'gopal.ramasammycook@gmail.com', 'Fri', 'Jan', '4', '09:05:31', '2008', '2008'], ['From', 'david.horwitz@uct.ac.za', 'Fri', 'Jan', '4', '07:02:32', '2008', '2008'], ['From', 'david.horwitz@uct.ac.za', 'Fri', 'Jan', '4', '06:08:27', '2008', '2008'], ['From', 'david.horwitz@uct.ac.za', 'Fri', 'Jan', '4', '04:49:08', '2008', '2008'], ['From', 'david.horwitz@uct.ac.za', 'Fri', 'Jan', '4', '04:33:44', '2008', '2008'], ['From', 'stephen.marquard@uct.ac.za', 'Fri', 'Jan', '4', '04:07:34', '2008', '2008'], ['From', 'louis@media.berkeley.edu', 'Thu', 'Jan', '3', '19:51:21', '2008', '2008'], ['From', 'louis@media.berkeley.edu', 'Thu', 'Jan', '3', '17:18:23', '2008', '2008'], ['From', 'ray@media.berkeley.edu', 'Thu', 'Jan', '3', '17:07:00', '2008', '2008'], ['From', 'cwen@iupui.edu', 'Thu', 'Jan', '3', '16:34:40', '2008', '2008'], ['From', 'cwen@iupui.edu', 'Thu', 'Jan', '3', '16:29:07', '2008', '2008'], ['From', 'cwen@iupui.edu', 'Thu', 'Jan', '3', '16:23:48', '2008', '2008']]
答案 1 :(得分:0)
如果您只想获取其时间部分。您可以尝试使用以下代码。
f = open('mbox-short.txt')
words = []
for x in f:
if x.startswith('From'):
w = x.split()
if len(w) > 5:
words.append(w[5])
print(words)
它返回如下数据:
['09:14:16', '18:10:48', '16:10:39', '15:46:24', '15:03:18', '14:50:18', '11:37:30', '11:35:08', '11:12:37', '11:11:52', '11:11:03', '11:10:22', '10:38:42', '10:17:43', '10:04:14', '09:05:31', '07:02:32', '06:08:27', '04:49:08', '04:33:44', '04:07:34', '19:51:21', '17:18:23', '17:07:00', '16:34:40', '16:29:07', '16:23:48']
希望有帮助。