我想创建一列充满相同值的列。我希望该值成为元组。不幸的是,熊猫认为我正在尝试传递一整列值。
df = pd.DataFrame(index=range(10))
df['foo']=9 #ok
df['bar']=(10,12) #think's I'm passing in a too-short column
ValueError:值的长度与索引的长度不匹配
如何将“ bar
”列的所有行设置为元组?
答案 0 :(得分:1)
您可以使用DataFrame构造函数:
df = pd.DataFrame({'foo': 9, 'bar':[(10,12)]}, index=range(10))
或者使用列表推导或按DataFrame
的长度重复元组:
df = pd.DataFrame(index=range(10))
df['foo']=9 #ok
df['bar']= [(10,12) for _ in df.index]
#another solution
#df['bar']= [(10,12)] * len(df)
print (df)
foo bar
0 9 (10, 12)
1 9 (10, 12)
2 9 (10, 12)
3 9 (10, 12)
4 9 (10, 12)
5 9 (10, 12)
6 9 (10, 12)
7 9 (10, 12)
8 9 (10, 12)
9 9 (10, 12)
性能:
df = pd.DataFrame(index=range(1000))
In [99]: %%timeit
...: df['bar']='10,12'
...: df['bar']=df['bar'].str.split(',').astype(tuple)
...:
977 µs ± 37.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [100]: %%timeit
...: df['bar']= [(10,12) for _ in df.index]
...:
218 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [101]: %%timeit
...: df['bar']= [(10,12)] * len(df)
...:
175 µs ± 8.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [105]: %%timeit
...: df = pd.DataFrame({'foo': 9, 'bar':[(10,12)]}, index=range(1000))
...:
400 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [106]: %%timeit
...: df = pd.DataFrame(index=range(1000))
...: df['foo']=9
...: df['bar']= [(10,12)] * len(df)
...:
766 µs ± 5.11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
答案 1 :(得分:1)
或者您可以这样做:
...
df['bar']='10,12'
df['bar']=df['bar'].str.split(',')
然后:
print(df)
是:
foo bar
0 9 [10, 12]
1 9 [10, 12]
2 9 [10, 12]
3 9 [10, 12]
4 9 [10, 12]
5 9 [10, 12]
6 9 [10, 12]
7 9 [10, 12]
8 9 [10, 12]
9 9 [10, 12]
如果要元组,请执行以下操作:
...
df['bar']='10,12'
df['bar']=df['bar'].str.split(',').map(tuple)
现在:
print(df)
是:
foo bar
0 9 (10, 12)
1 9 (10, 12)
2 9 (10, 12)
3 9 (10, 12)
4 9 (10, 12)
5 9 (10, 12)
6 9 (10, 12)
7 9 (10, 12)
8 9 (10, 12)
9 9 (10, 12)