Python Pandas - 如何限制行数并自动启动新列

时间:2016-07-08 22:23:43

标签: python csv numpy pandas

轻松一点!我是一名Python学生! :)

我有一个模拟硬币翻转的python程序。最终结果是每个硬币翻转被放入CSV中,作为-1(尾部)或1(头部)。我需要Pandas将每列的行数限制为100万,并在每100万行后自动继续到下一列。我该怎么做?我似乎无法找到适用的Pandas文章,而且我对这个主题的知识仍然非常有限。

import pandas as pd
import numpy as np

#get the flipcount
flipcount=int(input("How many times should I flip a coin?\n###:"))
samples = np.random.randint(0, 2, size = flipcount)

#create a pandas dataframe
data = pd.DataFrame([1 if i == 1 else -1 for i in samples])

#create a csv file
data.to_csv("data.csv", index=False, header=False)

这是最新的尝试:

import pandas as pd
import numpy as np

#get the flipcount
flipcount=int(input("How many times should I flip a coin?\n###:"))

#create the data
samples = np.random.choice([-1, 1], size = flipcount)

# calculate the numbers of columns
n_columns = flipcount//10**6
if flipcount % 10**6 !=0:
    n_columns+=1

# create the DataFrame
mylist = [samples[(i-1)*mybreak:i*mybreak] for i in range(1, n_columns+1)]
data = pd.DataFrame(mylist).T

#create a csv file
data.to_csv("data789.csv", index=False, header=False)

CMD出错

How many times should I flip a coin?
###:1001
Traceback (most recent call last):
  File "CoinFlipMania.py", line 16, in <module>
    mylist = [samples[(i-1)*mybreak:i*mybreak] for i in range(1, n_columns+1)]
  File "CoinFlipMania.py", line 16, in <listcomp>
    mylist = [samples[(i-1)*mybreak:i*mybreak] for i in range(1, n_columns+1)]
NameError: name 'mybreak' is not defined

2 个答案:

答案 0 :(得分:1)

我认为这对你有用:

#create the data
samples = np.random.choice([-1, 1], size = flipcount)

# calculate the numbers of columns
n_columns = flipcount//10**6
if flipcount % 10**6 !=0:
    n_columns+=1

# create the DataFrame
mybreak = 1e6
mylist = [samples[(i-1)*mybreak:i*mybreak] for i in range(1, n_columns+1)]
data = pd.DataFrame(mylist).T

答案 1 :(得分:1)

我认为以下方法最简单

flipcount = 2000001
my_break = 1000000

samples = np.random.choice([1, -1], size=flipcount)

if flipcount > my_break:
    n_empty = my_break - flipcount % my_break
    samples = np.append(samples, [np.nan] * n_empty).reshape((-1, my_break)).T

(pd.DataFrame(samples)
 .to_csv('my_csv.csv', index=False, header=False))