我有以下数据框。
Th列是lable,body_text,sendTokenized,lowerCased,停用词已删除,tokenized,lemmatized,bigrams和bigrams_flattern。下面给出的是bigrams_flattern列。
[(ive, searching), (searching, right), (right, word), (word, thank), (thank, breather), (i, promise), (promise, wont), (wont, take), (take, help), (help, granted), (granted, fulfil), (fulfil, promise), (you, wonderful), (wonderful, blessing), (blessing, time)]
[(free, entry), (entry, 2), (2, wkly), (wkly, comp), (comp, win), (win, fa), (fa, cup), (cup, final), (final, tkts), (tkts, 21st), (21st, may), (may, 2005), (text, fa), (fa, 87121), (87121, receive), (receive, entry), (entry, questionstd), (questionstd, txt), (txt, ratetcs), (ratetcs, apply), (apply, 08452810075over18s)]
[(nah, dont), (dont, think), (think, go), (go, usf), (usf, life), (life, around), (around, though)]
[(even, brother), (brother, like), (like, speak), (speak, me), (they, treat), (treat, like), (like, aid), (aid, patent)]
[(i, date), (date, sunday), (sunday, will)]
我要根据“标签”列的值对行进行分组。值是“垃圾邮件”或“火腿”。
输出应为
lable corpuses
1 ham [all the ham bigrams]
2 spam [all the spam bigrams]
我提到了pandas groupby and join lists,Specifying column order following groupby aggregation和http://pandas.pydata.org/pandas-docs/stable/groupby.html,然后尝试了此操作。
fullCorpus['corpuses'] = fullCorpus.groupby('lable')
我收到错误 ValueError(“值的长度与''index'的长度不匹配)。
我在哪里错了? groupby之后是否必须应用任何功能?
fullCorpus.head(5).to_dict()
{'lable': {0: 'ham', 1: 'spam', 2: 'ham', 3: 'ham', 4: 'ham'}, 'body_text': {0: "I've been searching for the right words to thank you for this breather. I promise i wont take your help for granted and will fulfil my promise. You have been wonderful and a blessing at all times.", 1: "Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's", 2: "Nah I don't think he goes to usf, he lives around here though", 3: 'Even my brother is not like to speak with me. They treat me like aids patent.', 4: 'I HAVE A DATE ON SUNDAY WITH WILL!!'}, 'sentTokenized': {0: ['Ive been searching for the right words to thank you for this breather', 'I promise i wont take your help for granted and will fulfil my promise', 'You have been wonderful and a blessing at all times'], 1: ['Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005', 'Text FA to 87121 to receive entry questionstd txt rateTCs apply 08452810075over18s'], 2: ['Nah I dont think he goes to usf he lives around here though'], 3: ['Even my brother is not like to speak with me', 'They treat me like aids patent'], 4: ['I HAVE A DATE ON SUNDAY WITH WILL', '']}, 'lowerCased': {0: ['ive been searching for the right words to thank you for this breather', 'i promise i wont take your help for granted and will fulfil my promise', 'you have been wonderful and a blessing at all times'], 1: ['free entry in 2 a wkly comp to win fa cup final tkts 21st may 2005', 'text fa to 87121 to receive entry questionstd txt ratetcs apply 08452810075over18s'], 2: ['nah i dont think he goes to usf he lives around here though'], 3: ['even my brother is not like to speak with me', 'they treat me like aids patent'], 4: ['i have a date on sunday with will', '']}, 'stopwordsRemoved': {0: ['ive searching right words thank breather', 'i promise wont take help granted fulfil promise', 'you wonderful blessing times'], 1: ['free entry 2 wkly comp win fa cup final tkts 21st may 2005', 'text fa 87121 receive entry questionstd txt ratetcs apply 08452810075over18s'], 2: ['nah dont think goes usf lives around though'], 3: ['even brother like speak me', 'they treat like aids patent'], 4: ['i date sunday will', '']}, 'tokenized': {0: [['ive', 'searching', 'right', 'words', 'thank', 'breather'], ['i', 'promise', 'wont', 'take', 'help', 'granted', 'fulfil', 'promise'], ['you', 'wonderful', 'blessing', 'times']], 1: [['free', 'entry', '2', 'wkly', 'comp', 'win', 'fa', 'cup', 'final', 'tkts', '21st', 'may', '2005'], ['text', 'fa', '87121', 'receive', 'entry', 'questionstd', 'txt', 'ratetcs', 'apply', '08452810075over18s']], 2: [['nah', 'dont', 'think', 'goes', 'usf', 'lives', 'around', 'though']], 3: [['even', 'brother', 'like', 'speak', 'me'], ['they', 'treat', 'like', 'aids', 'patent']], 4: [['i', 'date', 'sunday', 'will'], []]}, 'lemmatized': {0: [['ive', 'searching', 'right', 'word', 'thank', 'breather'], ['i', 'promise', 'wont', 'take', 'help', 'granted', 'fulfil', 'promise'], ['you', 'wonderful', 'blessing', 'time']], 1: [['free', 'entry', '2', 'wkly', 'comp', 'win', 'fa', 'cup', 'final', 'tkts', '21st', 'may', '2005'], ['text', 'fa', '87121', 'receive', 'entry', 'questionstd', 'txt', 'ratetcs', 'apply', '08452810075over18s']], 2: [['nah', 'dont', 'think', 'go', 'usf', 'life', 'around', 'though']], 3: [['even', 'brother', 'like', 'speak', 'me'], ['they', 'treat', 'like', 'aid', 'patent']], 4: [['i', 'date', 'sunday', 'will'], []]}, 'bigrams': {0: [[('ive', 'searching'), ('searching', 'right'), ('right', 'word'), ('word', 'thank'), ('thank', 'breather')], [('i', 'promise'), ('promise', 'wont'), ('wont', 'take'), ('take', 'help'), ('help', 'granted'), ('granted', 'fulfil'), ('fulfil', 'promise')], [('you', 'wonderful'), ('wonderful', 'blessing'), ('blessing', 'time')]], 1: [[('free', 'entry'), ('entry', '2'), ('2', 'wkly'), ('wkly', 'comp'), ('comp', 'win'), ('win', 'fa'), ('fa', 'cup'), ('cup', 'final'), ('final', 'tkts'), ('tkts', '21st'), ('21st', 'may'), ('may', '2005')], [('text', 'fa'), ('fa', '87121'), ('87121', 'receive'), ('receive', 'entry'), ('entry', 'questionstd'), ('questionstd', 'txt'), ('txt', 'ratetcs'), ('ratetcs', 'apply'), ('apply', '08452810075over18s')]], 2: [[('nah', 'dont'), ('dont', 'think'), ('think', 'go'), ('go', 'usf'), ('usf', 'life'), ('life', 'around'), ('around', 'though')]], 3: [[('even', 'brother'), ('brother', 'like'), ('like', 'speak'), ('speak', 'me')], [('they', 'treat'), ('treat', 'like'), ('like', 'aid'), ('aid', 'patent')]], 4: [[('i', 'date'), ('date', 'sunday'), ('sunday', 'will')], []]}, 'bigrams_flattern': {0: [('ive', 'searching'), ('searching', 'right'), ('right', 'word'), ('word', 'thank'), ('thank', 'breather'), ('i', 'promise'), ('promise', 'wont'), ('wont', 'take'), ('take', 'help'), ('help', 'granted'), ('granted', 'fulfil'), ('fulfil', 'promise'), ('you', 'wonderful'), ('wonderful', 'blessing'), ('blessing', 'time')], 1: [('free', 'entry'), ('entry', '2'), ('2', 'wkly'), ('wkly', 'comp'), ('comp', 'win'), ('win', 'fa'), ('fa', 'cup'), ('cup', 'final'), ('final', 'tkts'), ('tkts', '21st'), ('21st', 'may'), ('may', '2005'), ('text', 'fa'), ('fa', '87121'), ('87121', 'receive'), ('receive', 'entry'), ('entry', 'questionstd'), ('questionstd', 'txt'), ('txt', 'ratetcs'), ('ratetcs', 'apply'), ('apply', '08452810075over18s')], 2: [('nah', 'dont'), ('dont', 'think'), ('think', 'go'), ('go', 'usf'), ('usf', 'life'), ('life', 'around'), ('around', 'though')], 3: [('even', 'brother'), ('brother', 'like'), ('like', 'speak'), ('speak', 'me'), ('they', 'treat'), ('treat', 'like'), ('like', 'aid'), ('aid', 'patent')], 4: [('i', 'date'), ('date', 'sunday'), ('sunday', 'will')]}}
答案 0 :(得分:1)
IIUC,您想根据标签_renderSubmitButton = () => {
const { firstName, lastName, email, dateOfBirth, phone } = this.state.fields;
const { isSaving } = this.state
return (
<Mutation
mutation={CREATE_PERSON_MUTATION}
refetchQueries={() => [{ query: PERSON_QUERY }]}
onCompleted={this._handleServerResponse}
onError={this._handleServerError}
>
{
(createPerson, { loading, data, error } ) => { return (
<Button
primary
disabled={!(firstName && lastName && dateOfBirth && email && !_.isEmpty(phone.number) && !_.isEmpty(phone.type))}
onClick={() => this._submit({ createPerson, data, error, loading })}
loading={this.state.isSaving}
>
Save <Icon name='right chevron' style={style.iconPad} />
</Button>
)
}
}
</Mutation>
)
}
_submit = async ({ createPerson, data, error, loading }) => {
this.setState((prevState) => ({ isSaving: true }));
const person = this._buildPersonData();
createPerson({ variables: { person } })
.then((resp) => {
console.log(resp.data.createPerson.personID);
if(resp.data.createPerson.personID){
this.setState({
gottenID: true,
accessID: resp.data.createPerson.personID
});
}
})
//TRIGGER ANOTHER MUTATION
};
来使用您的二元组。使用您提供的字典,您可以通过执行aggregate
或仅执行.agg(sum)
来做到这一点:
sum()
收益
df = pd.DataFrame(provided_dict)
df.groupby('lable').bigrams.sum() # or .agg(sum)
然后您可以将其分配到新列以将其存储在df中
lable
ham [[(ive, searching), (searching, right), (right...
spam [[(free, entry), (entry, 2), (2, wkly), (wkly,...
Name: bigrams, dtype: object
答案 1 :(得分:0)
经过大量搜索,这给出了我所需要的。
fullCorpusAgg = fullCorpus.groupby('lable').agg({'bigrams_flattern': 'sum'})