根据差异删除数据

时间:2014-04-10 20:32:04

标签: r heatmap variance

我有一个20000 * 16(行*列)的巨大数据集。

我使用R制作此数据集的热图,但由于行数很多,似乎不可能。我想删除几乎没有或基本变化的数据点,从而减少数据矩阵中的行数。

有人可以指导我怎么做吗?

样本数据集是

Gene    A    B    C    D    E    F    G    H    I    J    K    L    M    N    O
PQ1    7.3159    9.3802    10.77    8.701    13.6066    8.3253    9.0556    9.8801    9.0776    11.2029    7.61    10.8403    9.2378    12.1697    9.7482
PQ2    7.4715    5.2955    10.2275    6.3606    10.1463    5.9968    6.2673    8.6119    6.153    6.7903    4.0843    13.0875    6.8167    8.3186    6.7643
PQ3    0    0    0    0    0.0026    0    0    0    0    0    0    0    0    0    0.0037
PQ4    1.776    1.125    1.3508    1.2489    2.1252    2.1057    1.0177    1.6063    1.0053    0.9571    1.4972    1.3998    1.0935    2.4737    1.2063
PQ5    0.1024    0.092    0.0473    0.071    0.1227    0.2047    0.2481    0.1089    0.0499    0.1381    0.057    0.0953    0.0433    0.0651    0.0598
PQ6    5.4296    0.1688    2.4767    0.2507    0.5087    4.2835    2.2989    8.6027    3.1126    0.4565    0.167    2.9066    3.195    0.942    5.8904
PQ7    0.2918    11.5673    4.9554    0    1.6693    1.6301    0.4985    2.4444    0.6217    1.4638    3.2648    0.5773    3.1071    7.651    0.4068
PQ8    0    0    0    0    0.0575    0.1018    0    0.0422    0    0    0    0.0257    0.0276    0    0
PQ9    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
PQ10    18.789    24.8681    29.8037    33.3986    37.8269    24.4719    21.1101    26.9985    21.9897    25.3416    26.77    23.1337    20.5193    27.0328    23.9777
PQ11    0    0.004    0    0    0.02    0    0    0.0265    0.0348    0    0.032    0    0.026    0    0.0167
PQ12    2.8442    4.7904    5.8717    3.2287    5.0917    1.5291    4.1187    6.207    3.532    5.4896    5.7066    5.1487    6.4386    11.2159    7.3469
PQ13    0    0.12    0.1776    0    0    0.0366    0.027    0    0    0    0.0462    0    0    0    0
PQ14    0    0    0    0    0    0    0.0136    0    0    0    0    0    0.0083    0    0
PQ15    0    0    0    0    0    0    0    0    0    0.0322    0    0    0    0    0
PQ16    0.0321    0.0469    0    0    0    0    0.0342    0    0    0    0    0    0    0    0
PQ17    0    0    0    0    0.0466    0.0225    0.0619    0.284    0.1252    0.0205    0    0.0371    0.1413    0.018    0.1238
PQ18    9.2029    12.3713    12.0135    7.7052    9.9121    7.3582    9.6782    12.5931    9.8137    12.4413    11.3418    14.4504    7.9965    8.5895    6.1705
PQ19    16.6408    9.1365    13.8613    12.6089    12.2094    16.5078    22.4689    16.4531    16.2172    15.6118    14.8256    18.5057    16.5483    13.5991    15.4934
PQ20    26.6048    24.1932    25.3238    27.9098    29.5022    25.4348    31.1095    30.4802    28.4243    21.2893    18.7577    27.3286    26.2074    30.6207    25.0771
PQ21    1.1114    0.067    0.3146    0.4593    0.3675    2.773    0.8119    0.5015    0.4696    0.5876    0.1406    0.2492    0.8565    0.2326    0.1521
PQ22    7.4962    5.3051    8.9577    6.1617    8.5887    8.2902    7.0983    7.1107    6.0231    6.9078    6.6685    7.3996    7.3611    8.3344    5.5536
PQ23    13.596    7.4782    9.6589    6.3121    10.7004    8.5035    9.769    10.1801    6.7358    5.0971    6.2171    9.713    7.0575    10.0523    7.5863
PQ24    18.564    35.9577    30.4134    27.9277    41.4544    23.1528    15.4656    32.0211    24.979    24.7365    41.5781    28.6164    34.8429    37.6385    27.1767
PQ25    15.5685    17.3154    17.0986    10.2068    13.5607    8.281    8.57    14.104    8.5732    8.3098    15.7368    18.3766    14.6625    14.2864    12.3646
PQ26    3.6639    5.5865    6.4437    2.7832    4.6902    6.4854    3.305    4.8913    3.0334    4.1835    5.9565    5.0441    4.4169    6.005    3.5551
PQ27    0.2116    0.0035    0.1193    0.0462    0.1113    0.3879    0.2976    0.9519    0.3039    0.0613    0.0478    0.5218    0.3197    0.1381    0.2277
PQ28    32.5026    28.1368    28.2335    25.6904    36.3761    26.779    36.4265    30.5154    35.1618    23.8327    27.087    24.7966    29.477    30.0189    26.1931
PQ29    1.8439    1.4574    1.2994    2.4006    0.6938    2.7233    0.6461    0.5976    1.7659    3.4405    1.5791    0.3336    1.8652    1.6685    2.0173
PQ30    1.7028    0.9633    2.0401    1.4563    1.4204    3.7509    1.843    2.071    2.3559    2.3659    1.2402    2.0673    2.2783    2.4221    1.3163
PQ31    0.1401    0.0283    0.3815    0.0434    0.1124    1.0891    0.0681    0.3404    0.2097    0.0552    0.1386    0.1835    0.2828    0.2267    0.2176
PQ32    3.1838    2.1398    4.1528    1.9499    3.0831    3.6193    3.0609    4.4113    2.4607    1.604    3.2404    4.4924    3.0917    4.525    3.0178
PQ33    0.0187    0.042    0.107    0    0.0162    0.0114    0.0366    0.0467    0.0532    0    0    0    0.0703    0.1173    0.0472
PQ34    1.3782    0.1604    0.3452    0.2124    0.0376    0.7386    0.4819    2.5638    0.3134    0.2188    1.6717    1.2121    0.4294    0.2202    0.2482
PQ35    0.0634    0.0294    0.0735    0.005    0.0558    0.1777    0.1734    0.0536    0.0259    0.0459    0.0217    0.0388    0.073    0.0206    0.074
PQ36    7.3565    4.5738    4.9642    1.8203    4.8537    12.1248    12.4298    8.541    11.8094    12.964    7.1189    17.0531    10.7116    6.5249    15.9312
PQ37    19.2056    16.5482    10.3252    26.8747    30.8489    26.5403    27.2519    12.1769    34.8122    26.1242    14.3651    12.6533    43.6538    24.7434    19.5469
PQ38    1.4191    5.2542    2.7084    4.6994    2.6367    3.0067    3.2322    3.8202    3.6902    3.6689    3.5244    1.3118    6.2961    3.4399    4.7755
PQ39    0.0032    0    0.0419    0    0.0001    0.0044    0    0.0259    0.0059    0    0    0    0.0203    0    0.0142
PQ40    5.5934    1.2258    7.0247    3.1928    3.7698    14.0234    2.3485    6.2129    4.1372    2.4574    3.8062    5.01    3.4968    4.6268    2.7763
PQ41    0.0664    0.0029    0.984    0    0.0448    0.3315    0.0709    0.5556    0.066    0.0443    0.1812    0.0918    0.1818    0.0491    0.2315
PQ42    12.4147    11.7431    20.2819    16.29    13.8172    16.5791    5.4218    11.46    15.264    26.2695    21.1681    14.128    16.8515    15.1775    11.4873
PQ43    0.0047    0.001    0.0731    0.0118    0.0169    0.207    0.0649    0.9764    0.0626    0.0002    0.0034    0.0657    0.3199    0.0003    0.2807
PQ44    0.135    0.0166    0.6497    0.0055    0.0229    0.1664    0.1529    0.4149    0.0361    0.0109    0.255    0.1788    0.1709    0.0291    0.3004
PQ45    56.8427    37.853    26.6238    10.5706    33.1238    45.9608    13.0512    17.1816    17.2876    12.7038    48.581    57.7831    20.1544    55.8307    17.7855
PQ46    0    0    0.0638    0    0    0    0    0    0    0    0    0    0    0    0
PQ47    0.3183    0.5558    0.9872    0.7507    0.963    0.9077    0.5323    2.3656    0.9466    0.8255    0.3479    1.184    1.8744    0.6751    0.3804
PQ48    0.0887    0.0237    0.5628    0.0256    0.3346    0.3528    0.1441    0.7293    0.2763    0.1582    0.0346    0.2104    0.3426    0.2687    0.152

我使用的命令是

rpkm<-read.table("heatmap_table.txt", header=T)
 row.names(rpkm)<-rpkm$Gene
 rpkm<-rpkm[,2:16]
 rpkm_matrix<-data.matrix(rpkm)
### some where here I need to put the variance filter.
 heatmap(rpkm_matrix)

谢谢

1 个答案:

答案 0 :(得分:1)

正如Carl Witthoft在评论中指出的那样,您要做的事情将改变您对数据集所做或将要做出的任何推断。您要删除的数据可能很重要。

那就是说,你需要具体说明什么&#34;小变化&#34;实际上是,但举个例子。如果您想要行的方差大于的所有行,例如0.0001,那么您可以定义一个参数,然后使用apply来获得方差大于参数的行。在数据的前20行中,删除了第3,9,14和15行。它还应删除方差为NA的那些行。

假设您的数据框名为dat

> getVar <- apply(dat[, -1], 1, var)
> param <- 1e-4
> dat[getVar > param & !is.na(getVar), ]