Question

我想从sys.argv中指定的路径打开带有python的csv文件。该文件的名称是'file.out'，我想从sys.argv [2]中指定的scriptlocation打开它。但是，我不知道如何在pd.read_csv命令中指定scriptlocation。我尝试了如下，但这不起作用。问题是什么？

我的代码如下

outputfolder = sys.argv[1]
scriptlocation = sys.argv[2]

df = pd.read_csv(open(scriptlocation('file.out', 'r')), header=None, delim_whitespace=True)

Answer 1

试试这个：

dat <-
structure(list(user = c(2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Log = c(2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), Pass = c(123L, 123L, 123L, 124L, 124L, 125L, 125L, 
125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 
125L, 125L, 125L), Value = c(342L, 543L, 231L, 257L, 342L, 543L, 
231L, 257L, 342L, 543L, 231L, 257L, 543L, 231L, 257L, 543L, 231L, 
257L, 543L, 231L, 257L)), .Names = c("user", "Log", "Pass", "Value"
), class = "data.frame", row.names = c(NA, -21L))

如果您使用Python v3.4 +和Pandas v0.18.1 +，则可以使用pathlib：

演示：

fun <- function(x, p = 0.20){
    n <- nrow(x)
    m <- max(1, round(n*p))
    inx <- c(seq_len(m), n - seq_len(m) + 1)
    x[-inx, ]
}

result <- do.call(rbind, lapply(split(dat, dat$user), fun))
row.names(result) <- NULL
result
#   user Log Pass Value
#1     2   2  123   543
#2     2   2  123   231
#3     2   2  124   257
#4     4   3  125   342
#5     4   3  125   543
#6     4   3  125   231
#7     4   3  125   257
#8     4   3  125   543
#9     4   3  125   231
#10    4   3  125   257
#11    4   3  125   543
#12    4   3  125   231
#13    4   3  125   257

Answer 2

这不是pandas问题。您需要的是根据根文件夹（scriptlocation，如果我理解正确）和文件名创建文件路径。然后，您将构建的文件路径传递给pd.read_csv()。所以你正在寻找os.path.join()：

output_fn = os.path.join(scriptlocation , 'file.out')
df = pd.read_csv(output_fn, header=None, delim_whitespace=True)

蟒蛇;从sys.argv指定的路径打开文件

2 个答案: