Question

我正在学习Python。我为自己设定了建立RSS刮刀的目标。我试图收集作者，链接和标题。从那里我想写一个CSV。

我遇到了一些问题。我从昨晚开始寻找答案，但似乎无法找到解决方案。我确实有一种感觉，即我在Feedparser正在解析并将其移动到CSV之间缺少一些知识，但我还没有知道谷歌的词汇。

如何删除特殊字符，例如＆＃39; [＆＃39;和＆＃39;＆＃39;＆＃39;？
当我创建新文件时，如何为新行写作者，链接和标题？

1个特殊字符

rssurls = 'http://feeds.feedburner.com/TechCrunch/'

techart = feedparser.parse(rssurls)
# feeds = []

# for url in rssurls:
#     feedparser.parse(url)
# for feed in feeds:
#     for post in feed.entries:
#         print(post.title)

# print(feed.entires)

techdeets = [post.author + " , " + post.title + " , " + post.link  for post in techart.entries]
techdeets = [y.strip() for y in techdeets]
techdeets

输出：我得到了我需要的信息，但.strip标签并没有。\ / p>

[＆＃39; Darrell Etherington，Spin推出首个获得城市批准的无人机湾区的自行车共享， http://feedproxy.google.com/~r/Techcrunch/~3/BF74UZWBinI/＆＃39;，＆＃39;瑞恩 Lawler，凭借530万美元的资金，CarDash希望改变你的方式让你的汽车维修， http://feedproxy.google.com/~r/Techcrunch/~3/pkamfdPAhhY/＆＃39;，＆＃39;罗恩 Miller，AlienVault插件在Dark Web上搜索被盗密码，http://feedproxy.google.com/~r/Techcrunch/~3/VbmdS0ODoSo/＆＃39;，＆＃39;卢卡斯 Matney，Firefox for Windows获得原生的WebVR支持，性能最新更新中的颠簸， http://feedproxy.google.com/~r/Techcrunch/~3/j91jQJm-f2E/＆＃39;，...]

2）写入CSV

import csv

savedfile = open('/test1.txt', 'w')
savedfile.write(str(techdeets) + "/n")
savedfile.close()

import pandas as pd
df = pd.read_csv('/test1.txt', encoding='cp1252')
df

输出：输出是一个只有1行和多列的数据帧。

Answer 1

你快到了： - ）

如何首先使用pandas创建数据帧然后保存它，就像这样“从代码中继续”：

df = pd.DataFrame(columns=['author', 'title', 'link'])
for i, post in enumerate(techart.entries):
    df.loc[i] = post.author, post.title, post.link

然后你可以保存它：

df.to_csv('myfilename.csv', index=False)

OR

您也可以直接从feedparser条目写入数据框：

>>> import feedparser
>>> import pandas as pd
>>>
>>> rssurls = 'http://feeds.feedburner.com/TechCrunch/'
>>> techart = feedparser.parse(rssurls)
>>>
>>> df = pd.DataFrame()
>>>
>>> df['author'] = [post.author for post in techart.entries]
>>> df['title'] = [post.title for post in techart.entries]
>>> df['link'] = [post.link for post in techart.entries]

FeedParser，删除特殊字符并写入CSV

1 个答案:

OR