我的数据框由Word(代表英文单词),sentence_ID(代表句子编号)和Flag(代表这个单词的一部分是否为句子,如果Flag = 1,这意味着句子边界内的单词,如果Flag = 0,则表示该单词位于句子的边缘。)
我想根据句子中心的距离对单词进行排名。 因此,输入是
Word sentence_ID Flag
A 1 1
B 1 1
C 1 1
D 1 1
E 1 1
A 1 0
F 2 1
G 2 1
H 2 1
I 2 1
A 2 0
J 0 0
k 0 0
M 0 0
C 3 1
D 3 1
E 3 1
A 3 1
F 3 1
G 3 1
H 3 1
I 3 1
A 3 1
J 3 1
G 3 0
H 0 0
I 0 0
L 4 1
输出是
Word sentence_ID Flag Rank
A 1 1 1
B 1 1 2
C 1 1 3
D 1 1 3
E 1 1 2
A 1 0 1
F 2 1 1
G 2 1 2
H 2 1 3
I 2 1 2
A 2 0 1
J 0 0
k 0 0
M 0 0
C 3 1 1
D 3 1 2
E 3 1 3
A 3 1 4
F 3 1 5
G 3 1 6
H 3 1 5
I 3 1 4
A 3 1 3
J 3 1 2
G 3 0 1
H 0 0
I 0 0
L 4 1 1
答案 0 :(得分:0)
试试这个例子:
sentence = [("foo",0), ("bar",0) , ("baz",0), ("foo",0), ("bar",0) ]
words = len( sentence )
if odd(words):
center = int(words / 2) + 1
else:
center = words / 2
for rank, i in enumerrate( range(0, center), 1):
sentence [i] [1] = rank
for rank, i in reversed( range(center, words), center-1):
sentence [i] [1] = rank
print(sentence).
答案 1 :(得分:0)
经过六个小时的编码,我找到了解决方案:
df = pd.read_csv(f_Name, sep=";",index_col=False)
df2= df.groupby(["sentence_ID"]).size().reset_index(name='count') # Find the length for each sentense
#Process first Sentense
j = 0
for index in range(0, len(df)):
if index in df['sentence_ID']:
if df.ix[index, 'sentence_ID'] in df2['sentence_ID'] and df.ix[index, 'sentence_ID'] != 0:
if index > 1 and df.ix[index, 'sentence_ID'] != df.ix[index -1, 'sentence_ID']:
j=0
CurrentSentensLength = df2.ix[df.ix[index, 'sentence_ID'], 'count']
if CurrentSentensLength % 2 == 1:
center = int(CurrentSentensLength / 2) + 1
center = index + center
else:
center = CurrentSentensLength / 2
center = index + center
elif index == 0:
# Process first Sentense
CurrentSentensLength = df2.ix[df.ix[index, 'sentence_ID'], 'count']
if CurrentSentensLength % 2 == 1:
center = int(CurrentSentensLength / 2) + 1
center = index + center
else:
center = CurrentSentensLength / 2
center = index + center
if index >= center:
if index !=center:
j=j-1
else:
j=j+1
df.ix[index, 'Gloss_Rank_On_Sentense'] = j