Question

我正在尝试创建一个包含4列'name''age''weight''height'的csv数据集以及此列的100行随机数据，但是第一步中的代码给了我一行而不是100行，如何我可以解决此问题，如何在csv文件中进行转换？

`import random
import pandas as pd
import numpy as np


person="person"
personList =[person+str(i) for i in range(100)]

ageList=[random.randint(1,90) for i in range(100)]

weightList=[random.randint(40,150) for i in range(100)]

heightList=[random.randint(140,210) for i in range(100)]

raw_data={'Name':[personList],
          'Age':[ageList],
          'Weight':[weightList],
          'Height':[heightList]}
df = pd.DataFrame([raw_data])

print(df)`

Answer 1

不要将值作为“列表列表”传递，即删除外部[ ]：

raw_data={'Name': personList,
          'Age': ageList,
          'Weight': weightList,
          'Height': heightList}
df = pd.DataFrame(raw_data)

要输出为csv，请使用：

df.to_csv('./filename.csv')

[出]

        Name  Age  Weight  Height
0    person0   23      59     158
1    person1   50      66     199
2    person2   18     100     183
3    person3    4      60     144
4    person4   14     123     188
5    person5   12      40     141
6    person6   44      65     171
7    person7   50      96     166
8    person8   82     114     166
9    person9   86     142     178
10  person10   51      93     142
11  person11    1      59     166
12  person12   61     138     152
13  person13   46      92     164
14  person14   25     103     195
15  person15   24      42     150
16  person16   33     123     186
17  person17   44      64     193
18  person18   40     118     159
19  person19   25     134     196
20  person20    5     117     178
...

另一种方法是使用numpy.random，其中大多数方法都有一个size参数：

import random
import pandas as pd
import numpy as np


person="person"
n = 100

personList = [person+str(i) for i in range(n)]

ageList = np.random.randint(1,90, size=n)

weightList = np.random.randint(40,150, size=n)

heightList = np.random.randint(140,210, size=n)

raw_data={'Name': personList,
          'Age': ageList,
          'Weight': weightList,
          'Height': heightList}
df = pd.DataFrame(raw_data)

Answer 2

numpy非常擅长构建随机数组，并且pandas在内部使用numpy数组。所以我的建议是使用：

...
ageList=np.random.randint(1,91,100)       # note the +1 on highest value  for np.random.randint

weightList=np.random.randint(40,151,100)

heightList=np.random.randint(140,211,100)

raw_data={'Name':[personList],
          'Age':[ageList],
          'Weight':[weightList],
          'Height':[heightList]}
df = pd.DataFrame(raw_data)              # note passing a mapping and not a sequence

我怎样才能有4列100行的随机csv数据集

2 个答案: