所以我正在部署一个不能使用pandas的网络应用程序。我在aws上使用python3和弹性beanstalk,目前还没有各种依赖项。
我只需要在一个函数中使用pandas - 用法很简单:制作一些数据框,然后通过df.loc
搜索它们。 - >有没有人知道具有df.loc[index, col]
功能的熊猫的好方法?
答案 0 :(得分:6)
你最好的选择是在dict中使用列表:
df_eq = {'col1' : [list, of, column, data],
'col2' : [list, of, column, data],
...,
'coln-1' : [list, of, column, data],
'coln' : [list, of, column, data]}
然后您可以使用loc
:
df_eq['coln'][idx]
答案 1 :(得分:0)
我会使用numpy
。此外,索引numpy
比索引w / pandas
Ar_data = np.array([["gyrados","raichu","mu","dragonair","vaporeon"],["water","electric","normal","dragon","water"], [0,0,0,1,2]]).T
Ar_data
# array([['gyrados', 'water', '0'],
# ['raichu', 'electric', '0'],
# ['mu', 'normal', '0'],
# ['dragonair', 'dragon', '1'],
# ['vaporeon', 'water', '2']],
# dtype='<U9')
# Index w/ ints `.iloc`
Ar_data[3,1]
# 'dragon'
fields = ["pokemon","status","meta"]
observations = ["p1","p2","p3","p4","p5"]
# Index w/ labels `.loc`
Ar_data[3,fields.index("pokemon")]
# 'dragonair'
Ar_data[observations.index("p4"),fields.index("pokemon")]
# 'dragonair'
# Time it
DF_data = pd.DataFrame(Ar_data, columns=fields, index=observations)
%timeit DF_data.iloc[3,1]
%timeit Ar_data[3,1]
# 10000 loops, best of 3: 129 µs per loop
# The slowest run took 21.69 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 384 ns per loop