Question

我有一批原始文本文件。每个文件都以Date>>month.day year News garbage开头。

garbage是我不需要的大量文字，长度各不相同。单词Date>>和News始终显示在同一位置，不会更改。

我想复制月份日并将此数据插入CSV文件，每个文件的新行格式为 day month year 。

如何将月份日复制到单独的变量中？

我尝试在已知单词之后和已知单词之前分割字符串。我熟悉字符串[x：y]，但我基本上想将x和y从数字改为实际单词（即字符串[Date＆gt;＆gt;：News]）

import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
    pass
else:
    # Tell the script the file is in this directory and can be written
    file = open(folder+'/'+filename, "r+")
    filecontents = file.read()
    thestring = str(filecontents)
    print thestring[9:20]

示例文本文件：

Date>>January 2. 2012 News 122

5 different news agencies have reported the story of a man washing his dog.

Answer 1

您可以使用字符串方法.split（＆＃34;＆＃34;）将输出分隔为在空格字符处拆分的变量列表。因为year和month.day将始终位于同一位置，您可以通过它们在输出列表中的位置来访问它们。要分开月份和日期，请再次使用.split函数，但这次是。

示例：

list = theString.split(" ")
year = list[1]
month= list[0].split(".")[0]
day = list[0].split(".")[1]

Answer 2

以下是使用re模块的解决方案：

import re

s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
   month, day, year = m.groups()
   print("{} {} {}").format(month, day, year)

输出：

January 2 2012

修改

实际上，使用link Robin posted中描述的re.split还有另一个更好的（imo）解决方案。使用这种方法你可以做到：

month, day, year = re.split(">>| |\. ", s)[1:4]

Answer 3

你可以使用string.split：

x = "A b c"
x.split(" ")

或者您可以使用正则表达式（我看到您导入但不使用）与组。我不记得手边的确切语法，但是re类似于r'(.*)(Date>>)(.*)。这将重新搜索字符串“Date＆gt;＆gt;”在任何其他类型的两个字符串之间。括号将把它们捕获到编号组中。

通过拆分从字符串中获取日期

3 个答案: