Question

所以我要去一个由x行排成20列的表，我需要为每一行找到最高值所属的列。例如：

The Table would be something like this (but larger)
 A       B       C       D       E       F       G
 1       2       3       4       5       6       7
 9       8       7       6       5       4       3
 7       6       5       8       4       3       2
 0.9     0.01    0.02    0.2     0.04    0.3   ...

我希望吐出来：G,A,D,A. 我需要把它放到另一个文件中。它甚至不必与字母。我稍后会用它做点什么。我一直试图弄清楚这样做的最佳方法，并且我一直在努力尝试用R来做，这是我到目前为止的脚本：

#!/usr/bin/env Rscript
a=read.table(get(TEST.csv),header=T,sep="",dec=".")
apply(a, 1, which.max)

它不想读我的测试文件。对于python，我有以下内容：

import numpy as np
import csv
a=np.genfromtxt('./TEST.csv',delimiter='\t',skip_header=1)
print(a)
amax=np.amax(a,axis=1)
print(amax)

这个正确地提取每行的最高值，但它不像我喜欢的那样提取列号。任何和所有建议将不胜感激。

Answer 1

您可以在max.col

中尝试R

names(a)[max.col(a, 'first')]
#[1] "G" "A" "D" "A"

Answer 2

您可以使用pandas.read_csv将文件读入数据框，然后使用[idxmax][2]：

import pandas as pd

df = pd.read_csv("in.csv", delimiter="\s+")

print(df.idxmax(axis=1))
0    G
1    A
2    D
3    A
dtype: object

用适当的分隔符替换分隔符。

Answer 3

在numpy中，使用argmax函数：

import numpy as np
a = np.array([[0, 1, 2],
   [3, 4, 5]])
np.argmax(a, axis=0)
# array([1, 1, 1])
np.argmax(a, axis=1)
# array([2, 2])

在您的情况下，axis应为1。

Answer 4

           df=pd.read_csv('./'+'FileName',delimiter='\t', usecols=range(1,21))
            amax=df.idxmax(axis=1)
            str1=''.join(amax)
            str2=''
            for index,c in enumerate(str1):
                    if c in mydict:
                            str2=str2+(str(index+1)+'\t'+str(mydict[c])+"\n")
            text_file=open('NewName',"w")
            text_file.write(str2)
            text_file.close()

标识最高行值属于的列，python或R.

4 个答案: