使用Python3和Anaconda,我在ipython上导入了pandas和os。我有一个非常大的csv文件。在文件上使用read_csv之后,我尝试在两列上使用.groupby(),但是它将数据类型从DataFrame更改为DataFrameGroupBy,并且我无法再在其上运行数据框架方法。
我想不起要尝试的任何东西。我从Codecademy获得的熊猫经验很少。我的代码似乎可以在那里工作。
2019-06-20 22:28:10,287 INFO kafka-producer-network-thread | producer-5 o.a.k.c.Metadata:285 - Cluster ID: 95g5Kjf7RoCKudHla5l7fA
2019-06-20 22:28:10,377 ERROR stream-table-sample-0178341b-1c0d-4f5a-b058-4d679303c87d-StreamThread-1 o.a.k.s.p.i.AssignedStreamsTasks:107 - stream-thread [stream-table-sample-0178341b-1c0d-4f5a-b058-4d679303c87d-StreamThread-1] Failed to process stream task 0_0 due to the following error:
java.lang.ClassCastException: [B cannot be cast to tki.bigdata.pojo.Contract
at org.apache.kafka.streams.kstream.internals.KStreamKTableJoinProcessor.process(KStreamKTableJoinProcessor.java:73)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:41)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamPassThrough$KStreamPassThroughProcessor.process(KStreamPassThrough.java:33)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:122)
at org.apache.kafka.streams.kstream.internals.KStreamBranch$KStreamBranchProcessor.process(KStreamBranch.java:48)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:129)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.KStreamMapValues$KStreamMapProcessor.process(KStreamMapValues.java:41)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:87)
at org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:302)
at org.apache.kafka.streams.processor.internals.AssignedStreamsTasks.process(AssignedStreamsTasks.java:94)
at org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:409)
at org.apache.kafka.streams.processor.internals.StreamThread.processAndMaybeCommit(StreamThread.java:964)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:832)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
我希望当我运行band_gaps.info()时,它将为我提供数据框的信息。相反,它给了我一个错误。当我检查band_gaps的类型时,它不再是数据框,而是DataFrameGroupBy。
答案 0 :(得分:0)
如果您查看Pandas groupby documentation,则会发现它返回了Sub Concatenate_Cap1()
Dim c As Range, rw As range, v3, v8
For Each c in Worksheets("PD Code Structure").Range("F2:F1006")
v3 = c.EntireRow.cells(3).value
v8 = c.EntireRow.cells(8).value
If InStr(v3, "FS_Tier_") And InStr(v8, "FS_CAP_1_") Then
c.value = v3 & " , " & v8
End If
Next cell
End Sub
或DataFrameGroupBy
对象,具体取决于您是否在{{1}上调用了SeriesGroupBy
}或.groupby
。因此,您观察到的行为就不足为奇了。
更重要的是,熊猫为什么要这样做?好吧,在您的情况下,您要将一堆行组合在一起。熊猫可以保留已分组的DataFrame
的某种表示形式,但不能执行其他任何操作(即,将其作为另一个Series
返回给您),直到您应用了{{ 1}}或DataFrame
。聚合函数获取每组行,并定义将该行转换为单行的某种方式。尝试将这些聚合函数之一应用于DataFrame
,看看会发生什么。
例如:
.sum
在将所有行按.count
分组之后,将返回一个band_gaps
,表示每一列的平均值。
df.groupby('column1').mean()
在按DataFrame
分组后,将返回一个column1
和df.groupby('column1')['column2'].sum()
中的值之和。请注意
Series
也有可能,但是在这种情况下,您要对所有列进行汇总后才选择感兴趣的列,这比进行汇总之前的切片要慢。