Question

它们都以“rsid_set（variable）”开头。我几乎没有编码经验，但一直在尝试使用R和python。有没有快速的方法来获得我想要的那些列？

跟进：是否有办法采用每列的方法并将其转换为具有10,000个值的正态分布？

Answer 1

# read in
df <- read.tsv("path/to/your/file")

# select only colnames beginning with rsid_set
df <- df[grep("^rsid_set",colnames(df)),] 

Your follow-up, I don't understand. You'll have to clarify what you want.

# Take the means of each column:
means <- colMeans(df)

# normal distribution with 10k values
norms <- rnorm(10e3)

我在tsv文件中有300,000列，我只需要10,000个。

1 个答案: