Question

我对Python比较陌生，但在这里......

我的程序从json文件导入一个字符串，例如＆＃34; #python是＃great＆＃34;。

我正在尝试解析字符串，以便每次出现＆＃34;＃＆＃34;它打印出下面的单词，直到它变成非字母数字字符，如空格或＆＃34; =＆＃34;。所以在这个例子中它打印：#python #great

我到目前为止的代码是：

with open("tweet.json") as json_file:
    data = json.load(json_file)
#opens my twitter file

def find_all(s, ch):
return [i for i, letter in enumerate(s) if letter == ch]

tags = find_all(data, "#")
length = len(tags)
#finds all occurrences of the "#" character

直到这里它一切运行良好，但在这个循环中它不幸地不能工作。

for x in range (0, length):

items = data[tags[x]:data.find('^\W+$')]
print items
x += 1

它也削减了最终角色。我真的坚持这一点，所以任何帮助都表示赞赏。

Answer 1

re.findall(r'#\w+', data)

\w用于匹配[A-Za-z_0-9]或所有字母数字字符。

Answer 2

正则表达式似乎是理想的解决方案

print re.findall("#[a-zA-Z]+",data)

在python中解析非字母数字字符串

2 个答案: