大熊猫不分细胞长度

时间:2017-05-18 02:19:51

标签: python pandas

长期以来一直在努力解决这个问题。我有一个如下所示的数据框:

dataframe pic

我试图将每个“计数器”的长度除以每个“内容”的长度。我认为这相当简单。到目前为止,我已经尝试过:

<RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:round="http://schemas.android.com/apk/res-auto"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent">
<LinearLayout
android:layout_width="fill_parent"
android:layout_height="match_parent"
android:orientation="vertical">
    <Button
        android:layout_width = "wrap_content"
        android:layout_height = "wrap_content"
        android:text="&lt;"/>
    <TextView
        android:layout_width = "fill_parent"
        android:layout_height = "wrap_content"
        android:layout_weight = "1"/>
    <Button
        android:layout_width = "wrap_content"
        android:layout_height = "wrap_content"
        android:text="&gt;"/>

</LinearLayout>

</RelativeLayout>`

以及使用reviews['diversity'] = reviews['counter'].apply(lambda x: 0 if len(x) == 0 else float(len(x)) / float(len(reviews['content'][x]))) 。我收到了大量错误消息x['content']

我试过了:

KeyError: "None of [['aberfeldy', 'recorded', 'their', 'debut', 'young', 'forever', 'using', 'a', 'single', 'microphone', 'good', 'for', 'them', 'in', 'that', 'spirit', 'i', 'cut', 'short', 'my', 'obligatory', 'introduction', 'and', 'bring', 'you', 'straight', 'to', 'the', 'edinburgh', 'group', 'lovelorn', 'unfortunately', 'still', 'heart', 'exposed', 'by', 'oh', 'production', 'love', 'is', 'verb', 'noun', 'as', 'well', 'find', 'it', 'dictionary', 'under', 'l', 'little', 'witticism', 'comes', 'from', 'an', 'arrow', 'written', 'sung', 'riley', 'briggs', 'based', 'on', 'one', 'photo', 'looks', 'like', 'anthony', 'michael', 'hall', 'though', 'his', 'vocals', 'chart', 'fairly', 'standard', 'indie', 'course', 'borrowing', 'neil', 'friend', 'ben', 'gibbard', 'what', 'do', 'plain', 'sensitive', 'guys', 'everywhere', 'listen', 'some', 'of', 'best', 'friends', 'are', 'favorite', 'albums', 'consist', 'campfire', 'singalongs', 'bands', 'with', 'modest', 'acoustic', 'guitar', 'chops', 'cute', 'names', 'accents', 'but', 'those', 'lyrics', 'no', 'band', 'would', 'sing', 'such', 'words', 'deserves', 'easily', 'made', 'comparisons', 'fellow', 'scots', 'belle', '', 'sebastian', 'or', 'even', 'camera', 'obscura', 'let', 'alone', 'earnest', 'aussies', 'lucksmiths', 'compare', 'twee', 'progenitors', 'pastels', 'talulah', 'gosh', 'owe', 'me', 'your', 'cardigan', 'moniker', 'nipped', 'scottish', 'vacation', 'destination', 'practically', 'beg', 'name', 'there', 'need', 'encourage', 'throughout', 'record', 'shows', 'predisposition', 'toward', 'bungling', 'old', 'english', 'teachers', 'motto', 'show', 'not', 'tell', 'this', 'may', 'be', 'result', 'medical', 'condition', 'dyslexia', 'which', 'case', 'we', 'should', 'hold', 'our', 'snark', 'seems', 'guy', 'can', 'open', 'mouth', 'without', 'saying', 'nothing', 'so', 'sad', 'leaving', 'he', 'sings', 'out', 'lonely', 'now', 'she', 'gone', 'adds', 'tie', 'teems', 'vivid', 'storytelling', 'goes', 'rhyme', 'sacred', 'wasted', 'reasons', 'until', 'somewhere', 'editor', 'rhyming', 'loses', 'her', 'job', 'often', 'at', 'when', 'they', 'stumble', 'beyond', 'trite', 'infantilism', 'first', 'vegetarian', 'restaurant', 'lopes', 'along', 'winning', 'tangled', 'up', 'blue', 'strums', 'accented', 'subtle', 'fiddles', 'lovely', 'boy', 'harmonies', 'seemingly', 'aiming', 'album', 'cheerful', 'unpretentious', 'look', 'everyday', 'here', 'finally', 'makes', 'interesting', 'way', 'dance', 'kitchen', 'says', 'willing', 'see', 'where', 'takes', 'him', 'then', 'proclaims', 'sometimes', 'believe', 'human', 'duck', 'cover', 'speaking', 'aliens', 'heliopolis', 'night', 'next', 'track', 'incidentally', 'its', 'second', 'whimsical', 'spaceship', 'song', 'complete', 'nose', 'perfect', 'unique', 'yeah', 'was', 'means', 'warm', 'pop', 'heats', 'headphones', 'veritable', 'help', 'root', 'begins', 'everyone', 'because', 'last', 'thing', 'world', 'needs', 'another', 'batch', 'sullen', 'scenesters', 'yet', 'any', 'relationship', 'just', 'someone', 'doesn', 'mean', 'back', 'beautiful', 'gibbs', 'tells', 'us', 'tender', 'moment', 'probably', 'if', 'hope', 'gets', 'laid']] are in the [index]"

得到同样的东西。

我尝试将applymap与def diverse(x): if len(x) == 0: return 0 else: return float(len(x)) / float(len(reviews['clean'][x])) reviews['diverse'] = reviews['counter'].apply(diverse)

一起使用

并获取reviews['diversity'] = reviews.applymap(lambda x: 0 if len(x) == 0 else float(len(reviews['counter'][x])) / float(len(reviews['content'][x])))

然而,如果我只是做("object of type 'int' has no len()", 'occurred at index Unnamed: 0'),我会得到float(len(reviews['counter'][4])) / float(len(reviews['clean'][4]))

非常感谢任何帮助。

编辑:我试过了:

0.634375

当我使用“print”而不是“return”时,它给了我所有的值。但是返回只划分第一行的长度,这看起来真的很奇怪?

1 个答案:

答案 0 :(得分:0)

这是我构建的玩具示例,用于说明如何做你要求的事情:

import pandas as pd
from collections import Counter

df = pd.DataFrame([['hello world i am a computer'],
                   ['hello i am a computer too hello computer']], 
                   columns=['content'])

df['counter'] = df.content.str.split().apply(Counter)
df

# returns:
                                 content                                              counter
             hello world i am a computer    {'am': 1, 'hello': 1, 'computer': 1, 'world': ...
hello i am a computer too hello computer    {'am': 1, 'hello': 2, 'computer': 2, 'a': 1, '...

这句话回答了你所说的问题:

df['diversity'] = df.content.str.len() / df.counter.apply(len)

但我认为你真正想要的是通过分割空格字符将content中的字符串分解为单词列表。在这种情况下,您可能想要:

df['diversity'] = df.content.str.split().apply(len) / df.counter.apply(len)