数据映射到Excel文件

时间:2012-03-23 03:53:31

标签: excel file matlab unix

我在一个文件夹中有635个文本文件。我想把这些数据读成excel。这并不像最初听起来那么简单。我一直在努力将文件导出到excel(在 stackoverflow :)的帮助下),现在我已经完成了工作,我想完成我的应用程序。以下是参数:

  1. 我的输出excel电子表格将有637列,并且数据将从第三列开始输入。您可以猜到,每列(3-639)将代表635个子文件夹中的一个。

  2. 电子表格中有73902行,数据将从第3行开始写入。

  3. 现在,要写入的数据来自635个文件。每个文件都有2列。例如,要在Excel工作表中填写第5列(对应于635中的 5-2 =第3个文件),我们会转到第3个文件并取用来自那里的值。该文件中的第一列确定要在Excel工作表中填写的单元格。要填写的值取自此文件的第二列(对不起,如果单词列变得混乱)。我们需要为工作表上的每一列填写73900行(每列为1个文件)。然后重复635列。

  4. 如果第3个文件如下所示:

    5849 66883
    395 4492863
    681 1835871
    817 4039961
    835 3246671
    868 4041156
    889 1891481
    1305 4467688
    1317 175306
    1361 3252611
    2101 174589
    4364 4053046
    4897 4466547
    4991 3879532
    5327 3992891
    5397 175328
    6067 3881675
    6075 176782
    6906 2358727
    7497 1838021
    

    然后我们将填写行5849, 385, 681,817等等......我们用文件中第二列的相应值填充这些单元格。因此,Excel工作表中的第5列将为单元格5849, 385, 681,817...填充值66883,4492863,1835871,4039961...

    我附上了excel表的图像,以使情况更清晰enter image description here

    到目前为止,我有Visual Basic代码将文本文件导入Excel,但没有真正处理上面讨论的任何信息。我还有一个小的MATLAB脚本来执行相同的操作(这不是完全正常运行)来将数据写入excel。我在下面粘贴。

    Sub ReadFilesIntoActiveSheet()
    Dim fso As FileSystemObject
    Dim folder As folder
    Dim file As file
    Dim FileText As TextStream
    Dim TextLine As String
    Dim Items() As String
    Dim i As Long
    Dim cl As Range
    
    ' Get a FileSystem object
    Set fso = New FileSystemObject
    
     ' get the directory you want
      Set folder = fso.GetFolder("D:\275_25bp")
    
    ' set the starting point to write the data to
     Set cl = ActiveSheet.Cells(1, 1)
    
    ' Loop thru all files in the folder
    For Each file In folder.Files
    ' Open the file
    Set FileText = file.OpenAsTextStream(ForReading)
    
    ' Read the file one line at a time
    Do While Not FileText.AtEndOfStream
        TextLine = FileText.ReadLine
    
        ' Parse the line into | delimited pieces
        Items = Split(TextLine, " ")
    
        ' Put data on one row in active sheet
        For i = 0 To UBound(Items)
            cl.Offset(0, i).Value = Items(i)
        Next
    
        ' Move to next row
        Set cl = cl.Offset(1, 0)
    Loop
    
    ' Clean up
    FileText.Close
    Next file
    
    Set FileText = Nothing
    Set file = Nothing
    Set folder = Nothing
    Set fso = Nothing
    
    End Sub
    

    ####################### MATLAB SCRIPT ################### #############

    dirname = uigetdir;#
    Files = dir(fullfile(dirname,'*.txt'))
    for i=1:numel(Files)
    filename = fullfile(dirname,Files(k).name);
    [col1,col2] = textread( filename, '%d%d' )
    %pos1 = strcat('A',num2str(i));
    %pos2 = strcat('B',num2str(i));
    xlswrite('sample_output',col1,'Sheet1','A1:CI1')
    xlswrite('sample_output',col2,'Sheet1','A2:CI2')
    end
    

    文件名称没有通用的命名模式,只是它们是唯一的名称。该文件夹按字母顺序包含它们,因此文件1(以A开头)和文件635(将以Z开头)。示例文件名:

    Acidothermus_cellulolyticus_11B-list.txt

    Frankia_alni_ACN14a-list.txt ... Zymomonas_mobilis_ZM4-list.txt

    使用什么语言并不重要,但最好是UNIX中的东西(我知道这不是一种语言:P)或MATLAB(因为我一直用这两种语言做这个项目)。

    我非常感谢这方面的帮助。如果您需要澄清需要做什么,请告诉我。谢谢!

2 个答案:

答案 0 :(得分:1)

由于您可以使用PERL,因此我根据上述信息创建了一个用于创建可由Excel提取的CSV文件的PERL代码/脚本。所有文件必须位于同一目录中。我有语法检查它。您可能需要对 glob 函数中的direcotory路径和文件名进行一些编辑(例如带有*的通配符的名称)。 glob 的输入将在本地目录中找到扩展名为.dat的所有文件。如果放入路径,请务必使用反斜杠。将按照alphabitical顺序处理文件,作为进行aphlabitical排序的sort函数。

如果您使用的是Window 7,您可能需要查看ActiveState PERL。这样你就不必下载和安装cygwin了。您可以从命令窗口运行它。这是我用来语法检查PERL脚本,我有Windows 7,64位。

* 注意:将以下代码更新为与cygwin PERL配合使用的代码。努力中仍然存在调试项目。 *

use strict; 
#use File::Glob ':glob'; 

# Array to save the data 
my @savedData = (); 

# Get the files to process and sort them 
# NOTE: Edit for where the files exist if not 
# local directory from where script is run 
my @files = sort <*.txt>; 
print "number of files " . scalar(@files) . "\n"; # should be <*.txt> 
# Will shift columns on the output 
my $column = 0; 
# Save the numbers from the line 
my @numbers = (); 
my $lineNumber = 0; 

# Go through the files 
foreach my $f ( @files ) 
{ 
  # Open the file 
  #print "Processing file: $f\n"; 
  my %temp = (); 
  open INFILE,$f or die "Unable to open file: $f"; 

  # Read a line from the file 
  while ( <INFILE> ) 
  { 
    # Increment the line number, remove the carriage return 
    $lineNumber++; 
    chomp; 
    # Get the numbers from the line 
    @numbers = split("\\s+"); 
    # Check for error in amount of items 
    if ( 2 != scalar(@numbers)) 
    { 
      die "ERROR: Line not well formed in file: $f Line: $lineNumber\n"; 
    } 
    # Save the information using the first number as the row 
    $savedData[$numbers[0]][$column] = $numbers[1]; 
    $temp{$numbers[0]} = 1; 
    #print "$column $savedData[$numbers[0]][$column] "; 
    #print "@numbers\n"; 
  } 

  # Close the file and increment the column by 2 
  close(INFILE); 
  $column++; 
  my @keys = keys %temp; 
  @keys = sort { $a <=> $b} @keys; 
  #thisprint "Range of row indexes is: $keys[0] $keys[$#keys]\n "; # gives the range of rows 
} 

# Loop Control Variable 
my $lcv = 0; 
# Variable to save output 
my $output = ""; 
# Open output file 
# NOTE: File will be opened in current directory 
open OUTFILE,">output.csv" or die "Unable to open output file: output.csv"; 

# TO PRINT ROWS FROM 3RD POSTION 
#print OUTFILE ",,,\n,,,\n"; # can remove this 

#print "Scalar is " . scalar(@savedData) . "\n"; 
# For each row in the matrix 

# For each row in the matrix 
#for( $lcv = 1; $lcv < scalar(@savedData) ; $lcv++ ) 
#{ 
# construct the output for all of the columns 
# Two columns is to shift the output over by 2 columns 
#my $lcv2; 
#$output = ","; 
#print "items is: " . ref($savedData[$lcv]) . "\n"; 
#for($lcv2=0;$lcv2 < scalar(@files); $lcv2++) 
#{ 
#$output .= ",$savedData[$lcv][$lcv2]"; 
#} 
# write it to file 
#print OUTFILE "$output\n"; 
#} 

#close(OUTFILE); 

# For each row in the matrix 
for( $lcv = 1; $lcv < scalar(@savedData) ; $lcv++ ) 
{ 
  # construct the output for all of the columns 
  # Two columns is to shift the output over by 2 columns 
  my $lcv2; 
  $output = ","; 
  for($lcv2=0;$lcv2 < scalar(@files); $lcv2++) 
  { 
    $savedData[$lcv][$lcv2] = int($savedData[$lcv][$lcv2] + 0); 
    $output .= ",$savedData[$lcv][$lcv2]"; 
  } 
  # write it to file 
  print OUTFILE "$output\n"; 
} 

close(OUTFILE);

更改为实现地图而不是数组,因为程序似乎内存不足。

以下是更改的代码。

use strict;  

# Map to save the data  
my %savedData = ();  

# Get the files to process and sort them  
# NOTE: Edit for where the files exist if not  
# local directory from where script is run  
my @files = sort <*.txt>;  
print "number of files " . scalar(@files) . "\n"; # should be <*.txt>  
# Will shift columns on the output  
my $column = 0;  
# Save the numbers from the line  
my @numbers = ();  
my $lineNumber = 0; 
my $lastRow = 0;
my $fileCount = 0;

# Go through the files  
foreach my $f ( @files )  
{  
  # Open the file 
  $fileCount++;  
  print "$fileCount: Processing file: $f\n";  
  my %temp = ();  
  open INFILE,$f or die "Unable to open file: $f";  

  # Read a line from the file  
  while ( <INFILE> )  
  {  
    # Increment the line number, remove the carriage return  
    $lineNumber++;  
    chomp;  
    # Get the numbers from the line  
    @numbers = split("\\s+");  
    # Check for error in amount of items  
    if ( 2 != scalar(@numbers))  
    {  
      die "ERROR: Line not well formed in file: $f Line: $lineNumber\n";  
    }  
    # Save the information using the first number as the row  
    $savedData{$numbers[0]}{$column} = $numbers[1];  
    $temp{$numbers[0]} = 1; 
    # Determine the last item in rows.  Save it for 
# future use
    if ( $lastRow < $numbers[0] )
    {
      $lastRow = $numbers[0];
    }

    #print "$column $savedData{$numbers[0]}{$column} ";  
    #print "@numbers\n";  
  }  

  # Close the file and increment the column by 2  
  close(INFILE);  
  $column++;  
  my @keys = keys %temp;  
  @keys = sort { $a <=> $b} @keys;  
  #thisprint "Range of row indexes is: $keys[0] $keys[$#keys]\n "; # gives the range of rows  
}  

# Loop Control Variable  
my $lcv = 0;  
# Variable to save output  
my $output = "";  
# Open output file  
# NOTE: File will be opened in current directory  
open OUTFILE,">output_map.csv" or 
   die "Unable to open output file: output_map.csv";  

# For each row in the matrix  
for( $lcv = 1; $lcv < $lastRow ; $lcv++ )  
{  
  # construct the output for all of the columns  
  # Two columns is to shift the output over by 2 columns  
  my $lcv2;  
  my $data = "";
  $output = ","; 
  for($lcv2=0;$lcv2 < scalar(@files); $lcv2++)  
  {  
    if ( exists $savedData{$lcv}{$lcv2} ) 
    {
      $data = int($savedData{$lcv}{$lcv2} + 0);  
      $output .= ",$data";  
    }
    else
    {
      $output .= ",0";
    }
  }  
  # write it to file  
  print OUTFILE "$output\n";  
}  

close(OUTFILE);

答案 1 :(得分:1)

我将问题分为“获取数据”和“将数据放入excel”。您似乎知道如何将数据放入Excel中,这非常好,所以我将专注于第一个。

最棒的是你基本上有一个大表或矩阵,Matlab喜欢矩阵。唯一可能是问题的是你的矩阵很大,也许大部分都是零。没关系;我们可以在matlab中使用sparse矩阵。

data = sparse(nRowMax, nFiles);

然后算法很简单:

  1. 对于每个文件......
    1. 确定列号col
    2. 每行......
      1. 将该行读作rowvalue
      2. value插入data(row, col)
    3. 重复2直到读完所有行
  2. 重复1直到读完所有文件。
  3. data现在包含excel电子表格中所需格式的所有数据。导出它。
  4. Matlab代码:

    for col = 1:nFiles
        filename = files(k).filename;
        fileID = fopen(filename);
        filedata = textscan(fileID, "%d %d");
    
        rownumbers = filedata{1};
        values = filedata{2};
    
        for i = 1:length(rownumbers)
            row = rownumbers(i);
            value = values(i);
            data(row, col) = value;
        end
    end