以下是jason数据示例。
id opened_date title exposure state
1 06/11/2014 9:28 AM Device rebooted and crashed with error 0x024 critical open
2 06/11/2014 7:12 AM Not able to connect to WiFi High open
3 07/23/2014 2:11 PM Sensor failed to recognize movement Low open
4 07/07/2014 5:20 PM When sensor activated, device rebooted with error 0x024 critical closed
我想用输入作为字符串编写代码,输出应指向ID。
例如:
Input String = Sensor : Output = ID 3 and 4 has 'Sensor' word in it
Input String = 0x024 : Output = ID 1 and 4 has '0x024' in it.
我猜这需要某种groupby,但它适用于完整的数据集而不是字符串。这可能是pandas
还是有其他更好的解决方案来分析它?
答案 0 :(得分:3)
您可以使用loc
按参数case=False
创建的条件str.contains
进行选择。如果您需要list
,请使用tolist
:
li = ['Sensor','0x024']
for i in li:
print (df.loc[df['title'].str.contains(i, case=False),'id'].tolist())
[3, 4]
[1, 4]
对于存储,您可以使用dict
理解:
dfs = { i: df.loc[df['title'].str.contains(i, case=False),'id'].tolist() for i in li }
print (dfs['Sensor'])
[3, 4]
print (dfs['0x024'])
[1, 4]
如果您需要function
,请尝试get_id
:
def get_id(id):
ids = df.loc[df['title'].str.contains(id, case=False),'id'].tolist()
return "Input String = %s : Output = ID " % id +
" and ".join(str(x) for x in ids) +
" has '%s' in it." % id
print (get_id('Sensor'))
Input String = Sensor : Output = ID 3 and 4 has 'Sensor' in it.
print (get_id('0x024'))
Input String = 0x024 : Output = ID 1 and 4 has '0x024' in it.
通过评论编辑:
现在它更复杂,因为使用逻辑and
:
def get_multiple_id(ids):
#split ids and crete list of boolean series containing each id
ids1 = [df['title'].str.contains(x, case=False) for x in ids.split()]
#http://stackoverflow.com/a/20528566/2901002
cond = np.logical_and.reduce(ids1)
ids = df.loc[cond,'id'].tolist()
return "Input String = '%s' : Output = ID " % id +
' and '.join(str(x) for x in ids) +
" has '%s' in it." % id
print (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 4 has '0x024 Sensor' in it.
如果使用逻辑or
,则会更容易,因为or
中的re
为|
,因此您可以使用0x024|Sensor
:
def get_multiple_id(id):
ids = df.loc[df['title'].str.contains(id.replace(' ','|'), case=False),'id'].tolist()
return "Input String = '%s' : Output = ID " % id +
' and '.join(str(x) for x in ids) +
" has '%s' in it." % id
print (get_multiple_id('0x024 Sensor'))
Input String = '0x024 Sensor' : Output = ID 1 and 3 and 4 has '0x024 Sensor' in it.