Question

我正在设置一个新工具，并希望支持我的教授提取相关专利数据的主题。我已经使用熊猫通过分析工具的输出创建了一个csv文件。实际上，我有以下代码：

import textrazor

textrazor.api_key= 'b033067632dba8a710c57f088115ad4eeff22142629bb1c07c780a10'

csv_contents = open('Patentdaten1.csv').read()

client = textrazor.TextRazor(extractors=['topics', 'entities'])

response = client.analyze(csv_contents)

topics =set()

relevance =set()

topics1= list(response.topics())

topics1.sort (key=lambda x:x.score, reverse=True)

for topic in response.topics():
    if topic.score > 0.5:
        if topic.label not in topics:
            topics.add(topic.label)
            relevance.add(topic.score)

import pandas as pd

df = pd.DataFrame({'topic' : [topics]})

df.to_csv('Test.csv', sep=';')

我希望获得一个csv文件，其中标题标签在标题“ topic”下列出。它应该看起来像这样：

; topic

0; Machine

1; Stairs

2; xxx

3; yyy

[...]

但是实际输出是一个csv文件，其中所有主题都列在一个大列中，如下所示：

; topic

0; 'Machine', 'Stairs', 'xxx', 'yyy'

1; 'Machine', 'Stairs', 'xxx', 'yyy'

2; 'Machine', 'Stairs', 'xxx', 'yyy'

3; 'Machine', 'Stairs', 'xxx', 'yyy'

[...]

感谢您的回答！

Answer 1

您需要转换集才能列出并删除[]：

df = pd.DataFrame({'topic' : list(topics)})

如何使用熊猫在csv中给逗号分隔的值添加新列？

1 个答案: