我正在使用Python(Nltk,Pandas)进行一些文本分析,并需要一些关于我的Dataframe的帮助。我还是编程初学者。
我有一个PoS标记数据帧(1000行,5列)。
列名:Number(在索引中),Id,Title,Question,Answers
#2 Example rows for Question:
[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),('website', 'NOUN')]
[('Would', 'VERB'), ('you', 'PRON'), ('recomme...)]
#2 Example rows for Answers:
[('This', 'DET'), ('is', 'VERB'), ('not', 'ADV'),('website', 'NOUN')]
[('There', 'DET'), ('is', 'VERB'), ('a', 'DET'...)]
目标:
1。)一个 列表(不是str),包含所有1000个PoS标记问题
2。)一个 列表(不是str)所有1000个PoS标记答案
3。)一个 列表(不是str),包含所有1000个PoS标记的答案和问题
我到目前为止尝试的是合并问题列中的所有行,但结果如下:
[[('I', 'PRON'), ('am', 'VERB'),..],[('Would', 'VERB'),
('you', 'PRON'), ('recomme...)],[(.....)]]
我想我加入他们时犯了一个错误。我该怎样才能正确地实现这样的列表:
[('I', 'PRON'), ('am', 'VERB'), ('working', 'VERB'),.....]
表示完整的专栏。
在Beneres回答后编辑:
谢谢你的快速回答。 .sum()是我之前做的方法,但结果是:
print (df['Merged'])
0 [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
1 [('I', 'PRON'), ('am', 'VERB'), ('building', '...
2 [('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
3 [('I', 'PRON'), ('am', 'VERB'), ('working', 'V...
我需要的是
print (df['Merged'])
0 [('Does', 'NOUN'), ('anyone', 'NOUN'), ('know'...
('I', 'PRON'), ('am', 'VERB'), ('building', '...
('I', 'PRON'), ('am', 'VERB'), ('wondering', ...
('I', 'PRON'), ('am', 'VERB'), ('working', 'V...]
编辑2: 解决
答案 0 :(得分:0)
如果我理解得很好,你只需要这样做:
df.sum()
合并问题和答案,然后执行
import pandas as pd
df = pd.DataFrame({'Q':[[('I', 'PRON'), ('am', 'VERB')], [('You', 'PRON'), ('are', 'VERB')]],
'A':[[('This', 'DET'), ('is', 'VERB')], [('Sparta', 'NOUN'), ('bitch', 'VERB')]]})
df['Merged'] = df['A'] +df['Q']
合并(汇总)所有列表。
示例:
df.sum()
然后:
A [(This, DET), (is, VERB), (Sparta, NOUN), (bit...
Q [(I, PRON), (am, VERB), (You, PRON), (are, VERB)]
Merged [(This, DET), (is, VERB), (I, PRON), (am, VERB...
dtype: object
看起来像这样:
public void startCalling()
{
inAudioThread = new Thread(new Runnable()
{
@Override
public void run()
{
try
{
InputStream in = socket.getInputStream();
byte[] buff = new byte[getBufferSize()];
Thread playSound = new Thread();
while((in.read(buff, 0, buff.length)) != -1)
{
playAudio(buff,playSound);
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
});
outAudioThread = new Thread(new Runnable()
{
@Override
public void run()
{
try
{
serverSocket= new ServerSocket(9092);
serverSocket.setReuseAddress(true);
socket = serverSocket.accept();
inAudioThread.start();
AudioFormat format = getAudioFormat();
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
tLine = (TargetDataLine)AudioSystem.getLine(info);
tLine.open(format);
tLine.start();
byte buffer[] = new byte[getBufferSize()];
out = new ByteArrayOutputStream();
bufferedOutputStream = new BufferedOutputStream(socket.getOutputStream());
running = true;
try
{
while(running)
{
int count = tLine.read(buffer, 0, buffer.length);
if (count > 0)
{
bufferedOutputStream.write(buffer, 0, count);
out.write(buffer, 0, count);
}
}
out.close();
bufferedOutputStream.close();
}
catch(IOException e)
{
}
}
catch(IOException | LineUnavailableException e)
{
}
}
});
outAudioThread.start();
}
然后我不太确定目标3的格式,如果这不是您想要的,请提供更多详细信息。
答案 1 :(得分:0)
我以一种奇怪的方式解决了这个问题,不知道这是否是一个很好的解决方案,但它有效:
from ast import literal_eval
# sum all columns and replace resulting "][" between columns with ", "
# change str to list with literal_eval
allQuestions = literal_eval(dfQuestion.sum().replace("][", " ,"))
allAnswers = literal_eval(dfAnswers.sum().replace("][", " ,"))
allPosts = allQuestions + allAnswers
我希望这可以帮助别人。