很难为此做一个简短但描述性的标题,但是我有一个数据框,其中每一行都是一个字符行,整个语料库就是整个节目。我创建了一个字典,其中的键是最重要的字符的列表,在DF中循环并将每个对话行附加到它们的键值上,我希望将其作为列表
我有一个名为“字符”的列和一个名为“对话”的列:
Character dialogue
PICARD 'You will agree Data that Starfleets
order are...'
DATA 'Difficult? Simply solve the mystery of
Farpoint Station.'
PICARD 'As simple as that.'
TROI 'Farpoint Station. Even the name sounds
mysterious.'
等等,依此类推...有许多次要字符,所以我只希望按对话计数排在前10位,所以我有一个名为major_chars的列表。我想要一个最终字典,其中每个字符都是关键,值是它们所有行的巨大列表。 我不知道如何追加到设置为每个键的值的空列表中。到目前为止,我的代码是:
char_corpuses = {}
for label, row in df.iterrows():
for char in main_chars:
if row['Character'] == char:
char_corpuses[char] = [row['dialogue']]
但是最终结果只是每个角色在语料库中所说的最后一行:
{'PICARD': [' so five card stud nothing wild and the skys the limit'],
'DATA': [' would you care to deal sir'],
'TROI': [' you were always welcome'],
'WORF': [' agreed'],
'Q': [' youll find out in any case ill be watching and if youre very lucky ill drop by to say hello from time to time see you out there'],
'RIKER': [' of course have a seat'],
'WESLEY': [' i will bye mom'],
'CRUSHER': [' you know i was thinking about what the captain told us about the future about how we all changed and drifted apart why would he want to tell us whats to come'],
'LAFORGE': [' sure goes against everything weve heard about not polluting the time line doesnt it'],
'GUINAN': [' thank you doctor this looks like a great racquet but er i dont play tennis never have']}
我如何不先清除每一行,而只获取每个字符的最后一行
答案 0 :(得分:1)
尝试类似这样的^^
char_corpuses = {}
for char in main_chars:
char_corpuses[char] = df[df.name == char]['dialogue'].values
答案 1 :(得分:1)
每次循环运行时,此行char_corpuses[char] = [row['dialogue']]
会使用当前对话框行覆盖列表的内容。它只写一个元素,而不是附加元素。
对于“香草”词典,请尝试:
import pandas
d = {'Character': ['PICARD', 'DATA', 'PICARD'], 'dialogue': ['You will agree Data that Starfleets order are...', 'Difficult? Simply solve the mystery of Farpoint Station.', 'As simple as that.']}
df = pandas.DataFrame(data=d)
main_chars = ['PICARD', 'DATA']
char_corpuses = {}
for label, row in df.iterrows():
for char in main_chars:
if row['Character'] == char:
try:
# Try to append the current dialogue line to array
char_corpuses[char].append(row['dialogue'])
except KeyError:
# The key doesn't exist yet, create empty list for the key [char]
char_corpuses[char] = []
char_corpuses[char].append(row['dialogue'])
输出
{'PICARD':['您将同意Starfleets订单的数据是...,'就这么简单。'],'DATA':['难于?只需解决Farpoint Station的奥秘。']}
答案 2 :(得分:0)
def atLeastOne[T](p: T => Boolean, as: Seq[T]): Boolean =
as.foldLeft(false)(_ || p(_))