如果这是一个骗子,请原谅我,我整个上午都在搜寻,只发现了一些拼图,无法完全将它们拼凑在一起。
我有一个简单的DataFrame
,在这里我想通过搜索list
searches
以与上述list
相同的顺序提取视图。示例:
import pandas as pd
data = {k: [v+str(i) for i in range(10)] for k, v in zip(('OrderNo','Name', 'Useless','Description'),('1000','Product ', 'Junk ','Short Desc '))}
df = pd.DataFrame(data)
df.loc[2:6, ('Useless',)] = pd.np.nan
# to mock some nan data in my real one.
结果df
:
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
1 10001 Product 1 Junk 1 Short Desc 1
2 10002 Product 2 Nan Short Desc 2
3 10003 Product 3 Nan Short Desc 3
4 10004 Product 4 Nan Short Desc 4
5 10005 Product 5 Nan Short Desc 5
6 10006 Product 6 Nan Short Desc 6
7 10007 Product 7 Junk 7 Short Desc 7
8 10008 Product 8 Junk 8 Short Desc 8
9 10009 Product 9 Junk 9 Short Desc 9
现在,我要像这样搜索list
中的OrderNos
:
searches = ['10005','10009','10003','10000']
我正在尝试这样的视图:
OrderNo Name Useless Description
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
3 10003 Product 3 Nan Short Desc 3
0 10000 Product 0 Junk 0 Short Desc 0
所以我最终可以将视图转置为该视图(注意,我删除了一些无用的列):
0 1 2 3
OrderNo 10005 10009 10003 10000
Name Product 5 Product 9 Product 3 Product 0
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
This great question/answer帮助我进行了searches
的搜索,但是返回的视图不符合我的顺序:
found = df.loc[df['OrderNo'].isin(searches)]
OrderNo Name Useless Description
0 10000 Product 0 Junk 0 Short Desc 0
3 10003 Product 3 Nan Short Desc 3
5 10005 Product 5 Nan Short Desc 5
9 10009 Product 9 Junk 9 Short Desc 9
我尝试将['my_sort']
列添加到found
,以便可以根据列表重新排序:
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found.sort_values(by='my_sort', inplace=True)
# For now assume index will always be matched and ValueError will be handled.
# This detail is not critical
虽然这种 kinda 有效,但是pandas
到处乱扔SettingWithCopyWarning
,告诉我改用.loc[row_indexer,col_indexer] = ...
。我也尝试过,但仍然向我抛出同样的警告。实际上,我尝试在found
下分配的任何内容似乎都抛出了相同的结果,因此我怀疑问题来自搜索。我最终将其包装为新的DataFrame
,以不再看到警告:
found = pd.DataFrame(df.loc[df['OrderNo'].isin(searches)])
found['my_sort'] = found['OrderNo'].apply(lambda x: searches.index(x))
found = found[columns].T
尽管这行得通,但我不禁觉得这很复杂且效率不高,因为我不得不引入一个新列以进行排序然后再次删除。我研究了一些相关的功能,例如reindex
或where
和dropna
的组合(无效,因为实际数据中还有其他nan
个对象),但都没有他们似乎朝着我的目标努力。
是否有更好的方法来解决这个问题?
答案 0 :(得分:3)
set_index
+ loc
+ T
您可以利用熊猫索引功能:
df = df.set_index('OrderNo')
searches = ['10005','10009','10003','10000']
df_search = df.loc[searches]
print(df_search)
Description Name Useless
OrderNo
10005 Short Desc 5 Product 5 NaN
10009 Short Desc 9 Product 9 Junk 9
10003 Short Desc 3 Product 3 NaN
10000 Short Desc 0 Product 0 Junk 0
res = df_search.T
print(res)
OrderNo 10005 10009 10003 10000
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
Name Product 5 Product 9 Product 3 Product 0
Useless NaN Junk 9 NaN Junk 0
如果您需要编号的列标签:
print(df_search.reset_index().T)
0 1 2 3
OrderNo 10005 10009 10003 10000
Description Short Desc 5 Short Desc 9 Short Desc 3 Short Desc 0
Name Product 5 Product 9 Product 3 Product 0
Useless NaN Junk 9 NaN Junk 0