我有一个非常大的数据集(20000x97)我想要拆分成多个子集,其中第1列应该包含在每个子集中,然后每个剩余的列应该与第1列一起放在单独的文件中。输出应该是tab - 分隔。请参阅下面的示例。
Mydata(例子):
Seq 124R 239G 361R 267G
TGAGGTAGTAGTTTGTGCTGTTG 27 15 15 52
CACCCGTAGAACCGACCTT 58 32 44 69
TCAAGTAATCCAGGATAGGC 4 4 6 15
TTTGGCAATGGTAGAACTCACACTGGTGAGGT 7 45 0 33
CACCCGTAGAACCGACCTTGC 488 740 834 1784
CTGAGACCTCTGGGTTCTGAGCT 20 11 4 33
CCCATAAAGTAGAAAGCAC 47 53 56 235
TACCCATTGCATATCGGAGTTGT 174 257 206 333
我想将文件分割成这样的子文件:
文件1:
Seq 124G
TGAGGTAGTAGTTTGTGCTGTTG 27
CACCCGTAGAACCGACCTT 58
TCAAGTAATCCAGGATAGGC 4
TTTGGCAATGGTAGAACTCACACTGGTGAGGT 7
CACCCGTAGAACCGACCTTGC 488
CTGAGACCTCTGGGTTCTGAGCT 20
CCCATAAAGTAGAAAGCAC 47
TACCCATTGCATATCGGAGTTGT 174
file2:
Seq 239G
TGAGGTAGTAGTTTGTGCTGTTG 15
CACCCGTAGAACCGACCTT 32
TCAAGTAATCCAGGATAGGC 4
TTTGGCAATGGTAGAACTCACACTGGTGAGGT 45
CACCCGTAGAACCGACCTTGC 740
CTGAGACCTCTGGGTTCTGAGCT 11
CCCATAAAGTAGAAAGCAC 53
TACCCATTGCATATCGGAGTTGT 257
... file3的
答案 0 :(得分:1)
如果没有,你有答案:试试下面
use strict;
use warnings;
open FH, "<input.txt";
my @ARR = <FH>;
my (@MAIN, @one, @two, @thr, @fou);
foreach (@ARR)
{
push (@MAIN, $1), push (@one, $2),push (@two, $3),push (@thr, $4),push (@fou, $5), if($_ =~ /(\S+)\s+?(\S+)\s+?(\S+)\s+?(\S+)\s+?(\S+)/);
}
foreach (1..4)
{
open FH, ">FILE$_".".txt";
my @ARR;
for(my $i = 0;$i<@MAIN;$i++)
{
if($_ == 1){@ARR = @one;}
if($_ == 2){@ARR = @two;}
if($_ == 3){@ARR = @thr;}
if($_ == 4){@ARR = @fou;}
print FH $MAIN[$i],"\t",$ARR[$i],"\n";
}
}
答案 1 :(得分:1)
也许以下内容会有所帮助:
perl script.pl dataFile
用法:import maya.cmds as cmds
from PySide import QtGui
import maya.OpenMayaUI as mui
import shiboken
class UI(object):
def __init__(self):
self.constraintMaster_UI()
def getMayaWindow(self):
pointer = mui.MQtUtil.mainWindow() # This is Maya's main window
QtGui.QMainWindow.styleSheet(shiboken.wrapInstance(long(pointer), QtGui.QWidget))
return shiboken.wrapInstance(long(pointer), QtGui.QWidget)
def clickedButton(self):
print "You just clicked the button!"
def constraintMaster_UI(self):
objectName = "pyConstraintMasterWin"
# Check to see if the UI exists, if so delete it
if cmds.window("pyConstraintMasterWin", exists = True):
cmds.deleteUI("pyConstraintMasterWin", wnd = True)
# Create the window, parent it to the main Maya window (parent -> window).
# Assign the object name (window name string) to the window
parent = self.getMayaWindow()
window = QtGui.QMainWindow(parent)
window.setObjectName(objectName)
window.setWindowTitle("Constraint Master")
window.setMinimumSize(400, 125)
window.setMaximumSize(400, 125)
# Create the main widget to contain all the stuff, parent it to the main Widget
mainWidget = QtGui.QWidget()
window.setCentralWidget(mainWidget)
# Create the main vertical layout, add the button and its command
verticalLayout = QtGui.QVBoxLayout(mainWidget)
button = QtGui.QPushButton("Create Constraint")
verticalLayout.addWidget(button)
button.clicked.connect(self.clickedButton)
window.show()
UI()
此方法一次只能从数据集中读取一行,因此处理“非常大的数据集”时应该没有问题。
答案 2 :(得分:1)
你可以试试这个
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
</IfModule>
更好的可读版本
awk -vOFS="\t" '{for(i=2;i<=NF;i++){ f=sprintf("file_%d.txt",i-1); if(f in F){ print $1,$i >>f }else{ print $1,$i >f; F[f]} close(f) }}' file