Question

我有一个长文件

Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End

我想在＆＃34; thin＆＃34;之后提取所有单词。和＆＃34;胖＆＃34;以逗号分隔。这些话也可以单独使用。在任何情况下，即使一条线上都存在薄和厚，它们也会以分号分隔。我的数组将包含：

wire, sheet, tube,rod,girl,boy

我需要这些单词的数组，然后我将用它来扩展函数的参数。既然它是混合物，我们怎样才能使用条带;然后再次使用strip，？

干杯

Answer 1

您可以在此处使用正则表达式来提取所需的值，然后使用re.split()分割逗号或分号：

这是我正在使用的正则表达式：

(?:thin|fat)(.*?)(?=thin|fat|\n)

它会在瘦/脂肪之后匹配任何东西，在它找到另一个瘦/脂肪或换行之前。

x = """
Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End
"""
import re

y = [j.strip() for i in re.findall(r'(?:thin|fat)(.*?)(?=thin|fat|\n)', x) for j in re.split(r'[;,]', i) if j.strip()]
print(y)

输出：

['wire', 'sheet', 'tube', 'rod', 'girl', 'boy']

你提到你从文件中读取这个文件有困难，这是一个从文件中读取的工作示例：

的test.txt

Jet pack(human, metal)
thin wire, sheet; fat tube,rod
thin girl;
fat boy;
We like to read
They like to write
End

代码

import re

with open('test.txt') as f:
  y = [j.strip() for i in re.findall(r'(?:thin|fat)(.*?)(?=thin|fat|\n)', f.read()) for j in re.split(r'[;,]', i) if j.strip()]
  print(y)

输出：

['wire', 'sheet', 'tube', 'rod', 'girl', 'boy']

You can try out my solution to see that it works here

在使用python分隔的某些字符串之后提取逗号分隔的单词

1 个答案: