用conditons在python中切片文件

时间:2017-09-05 22:00:37

标签: python-3.x file slice

假设我有一个txt。文件看起来像这样:

    0 day0 event_data0
    1 day1 event_data1
    2 day2 event_data2
    3 day3 event_data3
    4 day4 event_data4
    ........
    n dayn event_datan

    #where: 
    #n is the event index
    #dayn is the day when the event happened. year-month-day format
    #event_datan is what happened at the event.

从这个文件中,我需要创建一个新文件,其中包含两个特定日期之间发生的所有事件。喜欢在2003年9月之后和2006年圣诞节之前。 有人可以帮我解决这个问题吗?非常感谢!

4 个答案:

答案 0 :(得分:0)

看起来datetime模块就是您想要的。逐行遍历文件,直到当前行的日期和开始阈值日期(在您的示例中为2003年9月7日)之间的时间值为正;当你违反2006年圣诞节时停止迭代。将这些行加载到pandas数据帧或numpy数组中。

答案 1 :(得分:0)

卢卡斯,你可以试试这个:

import re
import os
from datetime import datetime as dt


__date_start__ = dt.strptime('2003-09-07', "%Y-%m-%d").date()
__date_end__ = dt.strptime('2006-12-25', "%Y-%m-%d").date()

f = open('file.txt', 'r').read()
os.remove('events.txt')

for i in f:
    date = re.search('\d{4}\-\d{2}-\d{2}',i).group(0)
    if date != '':
        date_converted = dt.strptime(date, '%Y-%m-%d').date()
        if (date_converted > __date_start__) and (date_converted < __date_end__):
            open('events.txt', 'a').write(i)

您会将__date_start____date_end__值更改为您想要的时间间隔,然后代码将在行中搜索与日期格式yyyy-mm-dd匹配的正则表达式。所以,它将在范围(日期开始和结束)中进行比较,如果为真,则在events.txt文件中追加行的内容。

答案 2 :(得分:0)

我假设您的文件是制表符分隔的,因此您可以使用pandas包来读取它。只需在.txt文件中添加第一行,其中列名称(索引,日期,事件)由制表符分隔,然后读入数据。

df = pandas.read_csv('txt_file.txt', sep='\t', index_col=0)
#index_col=0 just sets your first column as index

完成后,请按照此link中的步骤操作。这将基本上回答您关于如何通过简单地使用此包在两个日期之间选择事件的问题。这样,您只能使用所需的事件返回新数据框。

答案 3 :(得分:0)

在2003年9月7日之后和2006年圣诞节之前,你没有特别想要这样做。&#34;或者你有这两个日期的其他选择吗?

如果专门用于&#34;在2003年9月之后和2006年圣诞节之前。&#34;那么你可以在我看来用regex模块获得结果:

function expCalc() {
  DeleteColumns();
  RemoveEmptyColumns();
  RenameColumns();
  ResizeColumns();
  Sort();
  SavePDF();
}

//delete unwanted columns
function DeleteColumns() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getActiveSheet();
  var dataRange = sheet.getRange("A1:AH200");
  var data = sheet.getRange("A1:AH200");
  var values = data.getValues();
  var numRows = values.length;
  var numCols = values[0].length;
  for (var col = numCols - 1; col > 0; col--) {
    for (var row = 0; row < numRows; row++) {
      switch (values[row][col]) {
        case "Group":
        case "ID":
        case "Reg ID":
        case "Reg Date":
        case "Type of Payment":
        case "Transaction ID":
        case "Coupon Code":
        case "# Attendees":
        case "Date Paid":
        case "Price Option":
        case "Event Date":
        case "Event Time":
        case "Website Check-in":
        case "Tickets Scanned":
        case "Check-in Date":
        case "Seat Tag":
        case "BLS Add-on items (received at class):":
        case "Company Name":
        case "Address":
        case "Address 2":
        case "City":
        case "State":
        case "Zip":

          sheet.deleteColumn(col + 1); // delete column in sheet (1-based)
          continue; // continue with next column
          break; // can't get here, but good practice
      }
    }
  }
}

//Remove Empty Columns
function RemoveEmptyColumns() {
  var sh = SpreadsheetApp.getActiveSheet();
  var maxColumns = sh.getMaxColumns();
  var lastColumn = sh.getLastColumn();
  sh.deleteColumns(lastColumn + 1, maxColumns - lastColumn);
}

//Rename Columns
function RenameColumns() {
  SpreadsheetApp.getActiveSheet().getRange('A1').setValue('Type');
  SpreadsheetApp.getActiveSheet().getRange('B1').setValue('Paid');
  SpreadsheetApp.getActiveSheet().getRange('C1').setValue('Price');
  SpreadsheetApp.getActiveSheet().getRange('D1').setValue('Amt');
  SpreadsheetApp.getActiveSheet().getRange('E1').setValue('Class');
  SpreadsheetApp.getActiveSheet().getRange('F1').setValue('First Name');
  SpreadsheetApp.getActiveSheet().getRange('G1').setValue('Last Name');
  SpreadsheetApp.getActiveSheet().getRange('H1').setValue('Email');
  SpreadsheetApp.getActiveSheet().getRange('I1').setValue('Phone');
}

//Auto-Resize Columns
function ResizeColumns() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheets()[0];

  sheet.autoResizeColumn(1);
  sheet.autoResizeColumn(2);
  sheet.autoResizeColumn(3);
  sheet.autoResizeColumn(4);
  sheet.autoResizeColumn(5);
  sheet.autoResizeColumn(6);
  sheet.autoResizeColumn(7);
  sheet.autoResizeColumn(8);
  sheet.autoResizeColumn(9);
}

//Sort by last name
function Sort() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheets()[0];
  sheet.sort(7);
}

//Save PDF
function SavePDF(optSSId, optSheetId) {

  // If a sheet ID was provided, open that sheet, otherwise assume script is
  // sheet-bound, and open the active spreadsheet.
  var ss = (optSSId) ? SpreadsheetApp.openById(optSSId) : SpreadsheetApp.getActiveSpreadsheet();

  // Get URL of spreadsheet, and remove the trailing 'edit'
  var url = ss.getUrl().replace(/edit$/, '');

  // Get folder containing spreadsheet, for later export
  var parents = DriveApp.getFileById(ss.getId()).getParents();
  if (parents.hasNext()) {
    var folder = parents.next();
  } else {
    folder = DriveApp.getRootFolder();
  }

  // Get array of all sheets in spreadsheet
  var sheets = ss.getSheets();

  // Loop through all sheets, generating PDF files.
  for (var i = 0; i < sheets.length; i++) {
    var sheet = sheets[i];

    // If provided a optSheetId, only save it.
    if (optSheetId && optSheetId !== sheet.getSheetId()) continue;

    //additional parameters for exporting the sheet as a pdf
    var url_ext = 'export?exportFormat=pdf&format=pdf' //export as pdf
      +
      '&gid=' + sheet.getSheetId() //the sheet's Id
      // following parameters are optional...
      +
      '&size=letter' // paper size
      +
      '&portrait=false' // orientation, false for landscape
      +
      '&fitw=true' // fit to width, false for actual size
      +
      '&sheetnames=false&printtitle=false&pagenumbers=false' // hide optional headers and footers
      +
      '&gridlines=true' // hide/show gridlines
      +
      '&fzr=false'; // do not repeat row headers (frozen rows) on each page

    var options = {
      headers: {
        'Authorization': 'Bearer ' + ScriptApp.getOAuthToken()
      }
    }

    var response = UrlFetchApp.fetch(url + url_ext, options);

    var blob = response.getBlob().setName(ss.getName() + ' - ' + sheet.getName() + '.pdf');

    folder.createFile(blob);
  }
}

/**
 * Dummy function for API authorization only.
 * From: https://stackoverflow.com/a/37172203/1677912
 */
function forAuth_() {
  DriveApp.getFileById("Just for authorization"); // https://code.google.com/p/google-apps-script-issues/issues/detail?id=3579#c36
}

您也可以将条件与group()一起使用,或者您可以使用findall()方法。