我有大量的NFL场景数据集,但为了便于说明,让我把它简化为2个观察列表。像这样:
data = [[scenario1],[scenario2]]
以下是数据集包含的内容:
data[0][0]
>>"It is second down and 3. The ball is on your opponent's 5 yardline. There is 3 seconds left in the fourth quarter. You are down by 3 points."
data[1][0]
>>"It is first down and 10. The ball is on your 20 yardline. There is 7 minutes left in the third quarter. You are down by 10 points."
我不能像这样用字符串格式构建任何模型。所以我想将这些场景重新编码为新的列(或者如果你愿意的话,还有特征值)作为定量值。我以为我应该先把数据框放到一边:
down = 0
yards = 0
yardline = 0
seconds = 0
quarter = 0
points = 0
data = [[scenario1, down, yards, yardline, seconds, quarter, points], [scenario2, yards, yardline, seconds, quarter, points]]
现在是棘手的部分,有些我必须从场景列中的信息填充新列。棘手,因为例如,如果存在“对手”这个词,则在第二句中,这意味着我们必须将其计算为100-无论码数是多少。在上面的scenario1
变量中,它应该是100-5 = 95。
起初我认为我应该将所有数字分开并丢弃这些词,但正如上面所指出的,实际上需要一些词来正确分配定量值。我从来没有用这么微妙的东西做过一个lambda。或许,一个lambda不是正确的方法?我对任何/所有建议持开放态度。
对于强化,这是我想要看到的(如果我输入,则来自scenario1
:
data[0][1:]
>>2,3,95,3,4,-3
谢谢
答案 0 :(得分:1)
lambda不是你想要去的地方。 Python的re
模块是你的朋友:)
from re import search
def getScenarioData(scenario):
data = []
ordinals_to_nums = {'first':1, 'second':2, 'third':3, 'fourth':4}
numerals_to_nums = {
'zero':0, 'one':1, 'two':2, 'three':3, 'four':4,
'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9
}
# Downs
match = search('(first|second|third|fourth) down and', scenario)
if match:
raw_downs = match.group(1)
downs = ordinals_to_nums[raw_downs]
data.append(downs)
# Yards
match = search('down and (\S+)\.', scenario)
if match:
raw_yards = match.group(1)
data.append(int(raw_yards))
# Yardline
match = search("(oponent's)? (\S+) yardline", scenario)
if match:
raw_yardline = match.groups()
yardline = 100-int(raw_yardline[1]) if raw_yardline[0] else int(raw_yardline[1])
data.append(yardline)
# Seconds
match = search('(\S+) (seconds|minutes) left', scenario)
if match:
raw_secs = match.groups()
multiplier = 1 if raw_secs[1] == 'seconds' else 60
data.append(int(raw_secs[0]) * multiplier)
# Quarter
match = search('(\S+) quarter', scenario)
if match:
raw_quarter = match.group(1)
quarter = ordinals_to_nums[raw_quarter]
data.append(quarter)
# Points
match = search('(up|down) by (\S+) points', scenario)
if match:
raw_points = match.groups()
if raw_points:
polarity = 1 if raw_points[0] == 'up' else -1
points = int(raw_points[1]) * polarity
else:
points = 0
data.append(points)
return data
就个人而言,我发现存储像[[scenario, <scenario_data>], ...]
这样的数据有点奇怪,但是要将数据添加到每个场景中:
for s in data:
s.extend(getScenarioData(s[0]))
我建议使用字典列表,因为使用像data[0][3]
之类的索引可能会在一两个月之后混淆:
def getScenarioData(scenario):
# instead of data = []
data = {'scenario':scenario}
# instead of data.append(downs)
data['downs'] = downs
...
scenarios = ['...', '...']
data = [getScenarioData(s) for s in scenarios]
编辑:如果您想从序列中获取值,请使用get
方法阻止提升KeyError
,因为如果密钥为get
,则None
默认为for s in data:
print(s.get('quarter'))
找不到:
var auth = $firebaseAuth();
auth.$createUserWithEmailAndPassword(user.email, user.password)
.then(function(firebaseUser) {
console.log(firebaseUser)
var ref = new Firebase(FIREBASE_URL + "users")
.child(firebaseUser.uid).set({
date: Firebase.ServerValue.TIMESTAMP,
firstname: user.fname,
lastname: user.lname,
uid: firbaseUser.uid,
email: user.email,
});
})
.catch(function(error) {
console.log(error);
});