为所有行创建具有相同元组值的列

时间:2018-11-07 07:16:50

标签: python pandas

我想创建一列充满相同值的列。我希望该值成为元组。不幸的是,熊猫认为我正在尝试传递一整列值。

df = pd.DataFrame(index=range(10))
df['foo']=9      #ok
df['bar']=(10,12) #think's I'm passing in a too-short column
  

ValueError:值的长度与索引的长度不匹配

如何将“ bar”列的所有行设置为元组?

2 个答案:

答案 0 :(得分:1)

您可以使用DataFrame构造函数:

df = pd.DataFrame({'foo': 9, 'bar':[(10,12)]}, index=range(10))

或者使用列表推导或按DataFrame的长度重复元组:

df = pd.DataFrame(index=range(10))
df['foo']=9      #ok
df['bar']= [(10,12) for _ in df.index]
#another solution
#df['bar']= [(10,12)] * len(df)

print (df)
   foo       bar
0    9  (10, 12)
1    9  (10, 12)
2    9  (10, 12)
3    9  (10, 12)
4    9  (10, 12)
5    9  (10, 12)
6    9  (10, 12)
7    9  (10, 12)
8    9  (10, 12)
9    9  (10, 12)

性能

df = pd.DataFrame(index=range(1000))

In [99]: %%timeit
    ...: df['bar']='10,12'
    ...: df['bar']=df['bar'].str.split(',').astype(tuple)
    ...: 
977 µs ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [100]: %%timeit
     ...: df['bar']= [(10,12) for _ in df.index]
     ...: 
218 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [101]: %%timeit
     ...: df['bar']= [(10,12)] * len(df)
     ...: 
175 µs ± 8.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [105]: %%timeit
     ...: df = pd.DataFrame({'foo': 9, 'bar':[(10,12)]}, index=range(1000))
     ...: 
400 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [106]: %%timeit
     ...: df = pd.DataFrame(index=range(1000))
     ...: df['foo']=9
     ...: df['bar']= [(10,12)] * len(df)
     ...: 
766 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

答案 1 :(得分:1)

或者您可以这样做:

...
df['bar']='10,12'
df['bar']=df['bar'].str.split(',')

然后:

print(df)

是:

   foo       bar
0    9  [10, 12]
1    9  [10, 12]
2    9  [10, 12]
3    9  [10, 12]
4    9  [10, 12]
5    9  [10, 12]
6    9  [10, 12]
7    9  [10, 12]
8    9  [10, 12]
9    9  [10, 12]

如果要元组,请执行以下操作:

...
df['bar']='10,12'
df['bar']=df['bar'].str.split(',').map(tuple)

现在:

print(df)

是:

   foo       bar
0    9  (10, 12)
1    9  (10, 12)
2    9  (10, 12)
3    9  (10, 12)
4    9  (10, 12)
5    9  (10, 12)
6    9  (10, 12)
7    9  (10, 12)
8    9  (10, 12)
9    9  (10, 12)