将多个Google文档中的文字提取到Google电子表格中

时间:2013-09-21 15:03:58

标签: google-apps-script

我正在尝试从云端硬盘中的文件夹中的每个Google文档中提取文本,并将文本粘贴到Google电子表格的第一列,以便文件1的内容在A1中,文件2的内容在A2中等最终我试图重新创建一个存储在所有这些文件中的信息的数据库,所以如果文本可以按字段拆分那么好,但我认为这在使用Text to Columns的Excel中应该是微不足道的。

我已经在网上使用了几个片段来刺破它但我现在难倒了。

这是我的剧本:

//Function to extract the body from each document in a folder and copy it to a spreadsheet
function extract() {

//Define the folder we're working with ("Communication Passports") and get the file list
var folder = DocsList.getFolder("Communication Passports");
var contents = folder.getFiles();

//Define the destination spreadsheet file (CP) and set up the sheet to receive the data

var ss = SpreadsheetApp.openById("0AicdFGdf-Cx5dHFTX1R3Wm1RTEFTZ2d5ZmxuSjJSOHc");
SpreadsheetApp.setActiveSpreadsheet(ss);
Logger.log('File name: ' + ss.getName()); 
var sheet = SpreadsheetApp.getActiveSheet();
sheet.clear();
sheet.appendRow(["Name", "Date", "Contents", "URL", "Download", "Description"]);

//Set up other variables
var file;
var data;

//Loop through and collect the data (I don't actually need this - just borrowed the code from    a   snippet online - but it is SO CLOSE!)
//Sadly, getBody doesn't work on files, only on documents
for (var i = 0; i < contents.length; i++) {
file = contents[i];

data = [ 
  file.getName(),
  file.getDateCreated(),
  file.getViewers(),
  file.getUrl(),
  "https://docs.google.com/uc?export=download&confirm=no_antivirus&id=" + file.getId(),
  file.getDescription()
];

sheet.appendRow(data);


//Extract the text from the file (this doesn't work at present, but is what I actually need)


var doc = DocumentApp.openById(file.getId());
var body = doc.getBody();

//Find a way to paste the extracted body text to the spreadsheet
}
};

非常感谢任何帮助 - 我不是程序员,我是老师,信息是关于我们学校儿童的学习需求(有人在夏天删除了数据库,我们的备份只能回溯一个月! )。

谢谢,

西蒙

1 个答案:

答案 0 :(得分:1)

尝试添加:

var doc = DocumentApp.openById(file.getId());
body = doc.getBody().getText();

返回文档的实际内容。

我编写了另一个函数来将内容解析为更多可用的块,然后传回数据表中的一个条目,它运行正常。