Pandas:将数据框列合并到列表中

时间:2016-02-01 12:09:48

标签: python pandas merge nltk

我正在使用Python(Nltk,Pandas)进行一些文本分析,并需要一些关于我的Dataframe的帮助。我还是编程初学者。

我有一个PoS标记数据帧(1000行,5列)。

列名:Number(在索引中),Id,Title,Question,Answers

#2 Example rows for Question:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),('website', 'NOUN')]
[('Would', 'VERB'), ('you', 'PRON'), ('recomme...)] 

#2 Example rows for Answers:

[('This', 'DET'), ('is', 'VERB'), ('not', 'ADV'),('website', 'NOUN')] 
[('There', 'DET'), ('is', 'VERB'), ('a', 'DET'...)] 

目标:

1。)一个 列表(不是str),包含所有1000个PoS标记问题

2。)一个 列表(不是str)所有1000个PoS标记答案

3。)一个 列表(不是str),包含所有1000个PoS标记的答案和问题

我到目前为止尝试的是合并问题列中的所有行,但结果如下:

[[('I', 'PRON'), ('am', 'VERB'),..],[('Would', 'VERB'), 
('you', 'PRON'), ('recomme...)],[(.....)]]  

我想我加入他们时犯了一个错误。我该怎样才能正确地实现这样的列表:

[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),.....]

表示完整的专栏。

在Beneres回答后编辑:

谢谢你的快速回答。 .sum()是我之前做的方法,但结果是:

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
1      [('I', 'PRON'), ('am', 'VERB'), ('building', '...
2      [('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
3      [('I', 'PRON'), ('am', 'VERB'), ('working', 'V...

我需要的是

print (df['Merged'])
0      [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
        ('I', 'PRON'), ('am', 'VERB'), ('building', '...
        ('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
        ('I', 'PRON'), ('am', 'VERB'), ('working', 'V...]

编辑2: 解决

2 个答案:

答案 0 :(得分:0)

如果我理解得很好,你只需要这样做:

df.sum()

合并问题和答案,然后执行

import pandas as pd

df = pd.DataFrame({'Q':[[('I', 'PRON'), ('am', 'VERB')], [('You', 'PRON'), ('are', 'VERB')]], 
              'A':[[('This', 'DET'), ('is', 'VERB')], [('Sparta', 'NOUN'), ('bitch', 'VERB')]]})
df['Merged'] = df['A'] +df['Q']

合并(汇总)所有列表。

示例:

df.sum()

然后:

A         [(This, DET), (is, VERB), (Sparta, NOUN), (bit...
Q         [(I, PRON), (am, VERB), (You, PRON), (are, VERB)]
Merged    [(This, DET), (is, VERB), (I, PRON), (am, VERB...
dtype: object

看起来像这样:

public void startCalling()
    {
        inAudioThread = new Thread(new Runnable()
        {
            @Override
            public void run()
            {
                try
                {
                    InputStream in = socket.getInputStream();
                    byte[] buff = new byte[getBufferSize()];
                    Thread playSound = new Thread();
                    while((in.read(buff, 0, buff.length)) != -1)
                    {
                        playAudio(buff,playSound);
                    }
                }
                catch (IOException e)
                {
                    e.printStackTrace();
                }
            }
        });

        outAudioThread = new Thread(new Runnable()
        {
            @Override
            public void run()
            {
                try
                {
                    serverSocket= new ServerSocket(9092);
                    serverSocket.setReuseAddress(true);
                    socket      = serverSocket.accept();

                    inAudioThread.start();

                    AudioFormat format = getAudioFormat();
                    DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
                    tLine = (TargetDataLine)AudioSystem.getLine(info);
                    tLine.open(format);
                    tLine.start();

                    byte buffer[] = new byte[getBufferSize()];

                    out = new ByteArrayOutputStream();
                    bufferedOutputStream = new BufferedOutputStream(socket.getOutputStream());

                    running = true;
                    try
                    {
                        while(running)
                        {
                            int count = tLine.read(buffer, 0, buffer.length);
                            if (count > 0)
                            {
                                bufferedOutputStream.write(buffer, 0, count);
                                out.write(buffer, 0, count);
                            }
                        }
                        out.close();
                        bufferedOutputStream.close();
                    }
                    catch(IOException e)
                    {

                    }
                }
                catch(IOException | LineUnavailableException e)
                {

                }
            }
        });

        outAudioThread.start();
    }

然后我不太确定目标3的格式,如果这不是您想要的,请提供更多详细信息。

答案 1 :(得分:0)

我以一种奇怪的方式解决了这个问题,不知道这是否是一个很好的解决方案,但它有效:

from ast import literal_eval

# sum all columns and replace resulting "][" between columns with ", "
# change str to list with literal_eval
allQuestions = literal_eval(dfQuestion.sum().replace("][", " ,"))
allAnswers = literal_eval(dfAnswers.sum().replace("][", " ,"))
allPosts = allQuestions + allAnswers

我希望这可以帮助别人。