我在一个文件夹中有635个文本文件。我想把这些数据读成excel。这并不像最初听起来那么简单。我一直在努力将文件导出到excel(在 stackoverflow :)的帮助下),现在我已经完成了工作,我想完成我的应用程序。以下是参数:
我的输出excel电子表格将有637列,并且数据将从第三列开始输入。您可以猜到,每列(3-639)将代表635个子文件夹中的一个。
电子表格中有73902行,数据将从第3行开始写入。
现在,要写入的数据来自635个文件。每个文件都有2列。例如,要在Excel工作表中填写第5列(对应于635中的 5-2 =第3个文件),我们会转到第3个文件并取用来自那里的值。该文件中的第一列确定要在Excel工作表中填写的单元格。要填写的值取自此文件的第二列(对不起,如果单词列变得混乱)。我们需要为工作表上的每一列填写73900行(每列为1个文件)。然后重复635列。
如果第3个文件如下所示:
5849 66883
395 4492863
681 1835871
817 4039961
835 3246671
868 4041156
889 1891481
1305 4467688
1317 175306
1361 3252611
2101 174589
4364 4053046
4897 4466547
4991 3879532
5327 3992891
5397 175328
6067 3881675
6075 176782
6906 2358727
7497 1838021
然后我们将填写行5849, 385, 681,817
等等......我们用文件中第二列的相应值填充这些单元格。因此,Excel工作表中的第5列将为单元格5849, 385, 681,817...
填充值66883,4492863,1835871,4039961...
我附上了excel表的图像,以使情况更清晰
到目前为止,我有Visual Basic代码将文本文件导入Excel,但没有真正处理上面讨论的任何信息。我还有一个小的MATLAB脚本来执行相同的操作(这不是完全正常运行)来将数据写入excel。我在下面粘贴。
Sub ReadFilesIntoActiveSheet()
Dim fso As FileSystemObject
Dim folder As folder
Dim file As file
Dim FileText As TextStream
Dim TextLine As String
Dim Items() As String
Dim i As Long
Dim cl As Range
' Get a FileSystem object
Set fso = New FileSystemObject
' get the directory you want
Set folder = fso.GetFolder("D:\275_25bp")
' set the starting point to write the data to
Set cl = ActiveSheet.Cells(1, 1)
' Loop thru all files in the folder
For Each file In folder.Files
' Open the file
Set FileText = file.OpenAsTextStream(ForReading)
' Read the file one line at a time
Do While Not FileText.AtEndOfStream
TextLine = FileText.ReadLine
' Parse the line into | delimited pieces
Items = Split(TextLine, " ")
' Put data on one row in active sheet
For i = 0 To UBound(Items)
cl.Offset(0, i).Value = Items(i)
Next
' Move to next row
Set cl = cl.Offset(1, 0)
Loop
' Clean up
FileText.Close
Next file
Set FileText = Nothing
Set file = Nothing
Set folder = Nothing
Set fso = Nothing
End Sub
####################### MATLAB SCRIPT ################### #############
dirname = uigetdir;#
Files = dir(fullfile(dirname,'*.txt'))
for i=1:numel(Files)
filename = fullfile(dirname,Files(k).name);
[col1,col2] = textread( filename, '%d%d' )
%pos1 = strcat('A',num2str(i));
%pos2 = strcat('B',num2str(i));
xlswrite('sample_output',col1,'Sheet1','A1:CI1')
xlswrite('sample_output',col2,'Sheet1','A2:CI2')
end
文件名称没有通用的命名模式,只是它们是唯一的名称。该文件夹按字母顺序包含它们,因此文件1(以A开头)和文件635(将以Z开头)。示例文件名:
Acidothermus_cellulolyticus_11B-list.txt
Frankia_alni_ACN14a-list.txt
...
Zymomonas_mobilis_ZM4-list.txt
使用什么语言并不重要,但最好是UNIX中的东西(我知道这不是一种语言:P)或MATLAB(因为我一直用这两种语言做这个项目)。
我非常感谢这方面的帮助。如果您需要澄清需要做什么,请告诉我。谢谢!
答案 0 :(得分:1)
由于您可以使用PERL,因此我根据上述信息创建了一个用于创建可由Excel提取的CSV文件的PERL代码/脚本。所有文件必须位于同一目录中。我有语法检查它。您可能需要对 glob 函数中的direcotory路径和文件名进行一些编辑(例如带有*的通配符的名称)。 glob 的输入将在本地目录中找到扩展名为.dat的所有文件。如果放入路径,请务必使用反斜杠。将按照alphabitical顺序处理文件,作为进行aphlabitical排序的sort函数。
如果您使用的是Window 7,您可能需要查看ActiveState PERL。这样你就不必下载和安装cygwin了。您可以从命令窗口运行它。这是我用来语法检查PERL脚本,我有Windows 7,64位。
* 注意:将以下代码更新为与cygwin PERL配合使用的代码。努力中仍然存在调试项目。 *
use strict;
#use File::Glob ':glob';
# Array to save the data
my @savedData = ();
# Get the files to process and sort them
# NOTE: Edit for where the files exist if not
# local directory from where script is run
my @files = sort <*.txt>;
print "number of files " . scalar(@files) . "\n"; # should be <*.txt>
# Will shift columns on the output
my $column = 0;
# Save the numbers from the line
my @numbers = ();
my $lineNumber = 0;
# Go through the files
foreach my $f ( @files )
{
# Open the file
#print "Processing file: $f\n";
my %temp = ();
open INFILE,$f or die "Unable to open file: $f";
# Read a line from the file
while ( <INFILE> )
{
# Increment the line number, remove the carriage return
$lineNumber++;
chomp;
# Get the numbers from the line
@numbers = split("\\s+");
# Check for error in amount of items
if ( 2 != scalar(@numbers))
{
die "ERROR: Line not well formed in file: $f Line: $lineNumber\n";
}
# Save the information using the first number as the row
$savedData[$numbers[0]][$column] = $numbers[1];
$temp{$numbers[0]} = 1;
#print "$column $savedData[$numbers[0]][$column] ";
#print "@numbers\n";
}
# Close the file and increment the column by 2
close(INFILE);
$column++;
my @keys = keys %temp;
@keys = sort { $a <=> $b} @keys;
#thisprint "Range of row indexes is: $keys[0] $keys[$#keys]\n "; # gives the range of rows
}
# Loop Control Variable
my $lcv = 0;
# Variable to save output
my $output = "";
# Open output file
# NOTE: File will be opened in current directory
open OUTFILE,">output.csv" or die "Unable to open output file: output.csv";
# TO PRINT ROWS FROM 3RD POSTION
#print OUTFILE ",,,\n,,,\n"; # can remove this
#print "Scalar is " . scalar(@savedData) . "\n";
# For each row in the matrix
# For each row in the matrix
#for( $lcv = 1; $lcv < scalar(@savedData) ; $lcv++ )
#{
# construct the output for all of the columns
# Two columns is to shift the output over by 2 columns
#my $lcv2;
#$output = ",";
#print "items is: " . ref($savedData[$lcv]) . "\n";
#for($lcv2=0;$lcv2 < scalar(@files); $lcv2++)
#{
#$output .= ",$savedData[$lcv][$lcv2]";
#}
# write it to file
#print OUTFILE "$output\n";
#}
#close(OUTFILE);
# For each row in the matrix
for( $lcv = 1; $lcv < scalar(@savedData) ; $lcv++ )
{
# construct the output for all of the columns
# Two columns is to shift the output over by 2 columns
my $lcv2;
$output = ",";
for($lcv2=0;$lcv2 < scalar(@files); $lcv2++)
{
$savedData[$lcv][$lcv2] = int($savedData[$lcv][$lcv2] + 0);
$output .= ",$savedData[$lcv][$lcv2]";
}
# write it to file
print OUTFILE "$output\n";
}
close(OUTFILE);
更改为实现地图而不是数组,因为程序似乎内存不足。
以下是更改的代码。
use strict;
# Map to save the data
my %savedData = ();
# Get the files to process and sort them
# NOTE: Edit for where the files exist if not
# local directory from where script is run
my @files = sort <*.txt>;
print "number of files " . scalar(@files) . "\n"; # should be <*.txt>
# Will shift columns on the output
my $column = 0;
# Save the numbers from the line
my @numbers = ();
my $lineNumber = 0;
my $lastRow = 0;
my $fileCount = 0;
# Go through the files
foreach my $f ( @files )
{
# Open the file
$fileCount++;
print "$fileCount: Processing file: $f\n";
my %temp = ();
open INFILE,$f or die "Unable to open file: $f";
# Read a line from the file
while ( <INFILE> )
{
# Increment the line number, remove the carriage return
$lineNumber++;
chomp;
# Get the numbers from the line
@numbers = split("\\s+");
# Check for error in amount of items
if ( 2 != scalar(@numbers))
{
die "ERROR: Line not well formed in file: $f Line: $lineNumber\n";
}
# Save the information using the first number as the row
$savedData{$numbers[0]}{$column} = $numbers[1];
$temp{$numbers[0]} = 1;
# Determine the last item in rows. Save it for
# future use
if ( $lastRow < $numbers[0] )
{
$lastRow = $numbers[0];
}
#print "$column $savedData{$numbers[0]}{$column} ";
#print "@numbers\n";
}
# Close the file and increment the column by 2
close(INFILE);
$column++;
my @keys = keys %temp;
@keys = sort { $a <=> $b} @keys;
#thisprint "Range of row indexes is: $keys[0] $keys[$#keys]\n "; # gives the range of rows
}
# Loop Control Variable
my $lcv = 0;
# Variable to save output
my $output = "";
# Open output file
# NOTE: File will be opened in current directory
open OUTFILE,">output_map.csv" or
die "Unable to open output file: output_map.csv";
# For each row in the matrix
for( $lcv = 1; $lcv < $lastRow ; $lcv++ )
{
# construct the output for all of the columns
# Two columns is to shift the output over by 2 columns
my $lcv2;
my $data = "";
$output = ",";
for($lcv2=0;$lcv2 < scalar(@files); $lcv2++)
{
if ( exists $savedData{$lcv}{$lcv2} )
{
$data = int($savedData{$lcv}{$lcv2} + 0);
$output .= ",$data";
}
else
{
$output .= ",0";
}
}
# write it to file
print OUTFILE "$output\n";
}
close(OUTFILE);
答案 1 :(得分:1)
我将问题分为“获取数据”和“将数据放入excel”。您似乎知道如何将数据放入Excel中,这非常好,所以我将专注于第一个。
最棒的是你基本上有一个大表或矩阵,Matlab喜欢矩阵。唯一可能是问题的是你的矩阵很大,也许大部分都是零。没关系;我们可以在matlab中使用sparse矩阵。
data = sparse(nRowMax, nFiles);
然后算法很简单:
col
。row
和value
value
插入data(row, col)
data
现在包含excel电子表格中所需格式的所有数据。导出它。Matlab代码:
for col = 1:nFiles
filename = files(k).filename;
fileID = fopen(filename);
filedata = textscan(fileID, "%d %d");
rownumbers = filedata{1};
values = filedata{2};
for i = 1:length(rownumbers)
row = rownumbers(i);
value = values(i);
data(row, col) = value;
end
end