我已经搜索了一段时间,但还没有看到一个简单的答案。
我有一个非常结构化的txt文件,其中包含很多这样的元素:
product/productId: B000GKXY4S
review/userId: A1QA985ULVCQOB
review/profileName: Carleen M. Amadio "Lady Dragonfly"
review/helpfulness: 2/2
review/score: 5.0
review/time: 1314057600
review/summary: Fun for adults too!
review/text: I really enjoy these scissors for my inspiration books that I am making (like collage, but in books) and using these different textures these give is just wonderful, makes a great statement with the pictures and sayings. Want more, perfect for any need you have even for gifts as well. Pretty cool!
product/productId: B000GKXY4S
review/userId: ALCX2ELNHLQA7
review/profileName: Barbara
review/helpfulness: 0/0
review/score: 5.0
review/time: 1328659200
review/summary: Making the cut!
review/text: Looked all over in art supply and other stores for "crazy cutting" scissors for my 4-year old grandson. These are exactly what I was looking for - fun, very well made, metal rather than plastic blades (so they actually do a good job of cutting paper), safe ("blunt") ends, etc. (These really are for age 4 and up, not younger.) Very high quality. Very pleased with the product.
product/productId: B000140KIW
review/userId: A2M2M4R1KG5WOL
review/profileName: L. Heminway
review/helpfulness: 1/1
review/score: 5.0
review/time: 1156636800
review/summary: Fiskars Softouch Multi-Purpose Scissors, 10"
review/text: These are the BEST scissors I have ever owned. I am left-handed and take note that either a left or right-handed person can use these equally well. If you have arthritis, as I do, these scissors are amazing as well. Well worth the price. I now own three pairs of these and have convinced many other people in my quilting group that they NEED a pair as well! They cut through muli layers and difficult to cut items really well. Do buy them, you won't regret it!
这将是一个字典,我想要一个这样的字典列表。最简单的方法是什么?我试过csv
,但似乎不正确:
field = ("product/productId", "review/userId", "review/profileName", "review/helpfulness",
"review/score","review/time", "review/summary", "review/text")
reader = csv.DictReader(open('../Arts.txt'), fieldnames=field)
有人可以帮我解决这个新手问题吗?谢谢!
答案 0 :(得分:3)
在这种情况下,您只想读取每一行,在:
上拆分以获取键和值,然后将该对添加到当前字典中。由于您的文件结构良好,您只需通过字段名称检测新块的开始时间:
data = []
current = {}
with open('../Arts.txt') as f:
for line in f:
pair = line.split(': ', 1)
if len(pair) == 2:
if pair[0] == 'product/productId' and current:
# start of a new block
data.append(current)
current = {}
current[pair[0]] = pair[1]
if current:
data.append(current)
如果您的文件包含多个列,则可以使用csv,例如,具有相同数据的csv文件可能如下所示:
product/productId,review/userId,review/profileName,...
B000GKXY4S,A1QA985ULVCQOB,Carleen M. Amadio "Lady Dragonfly",...
B000GKXY4S,ALCX2ELNHLQA7,Barbara,...
答案 1 :(得分:1)
我很惊讶csv阅读器不起作用,也许你做了一些意外的读者。
节省大量词典并不是一个好用法。相反,在集合中有一个名为namedtuple的内置“不可变dict”,它更便宜且易于使用。
这实际上可以通过简单地一次读取一行常量(在这种情况下,8行+ 1个空行)来解决:
from collections import namedtuple
data_point = namedtuple('data_point', field)
data_lst = list()
with open('some_path/somefile.txt') as f_in:
while True:
data = [f_in.readline().strip().split(':')[1] for range(8)]
if sum([len(ele) for ele in data]) == 0:
break
data_lst.append(data_point(data))
f_in.readline()
人们习惯于在python中循环,他们忘记了while循环的存在。
如果您在问题中显示的内容并未在整个文件中保留,则数字8可能会有所不同。在这种情况下,您应该花费读取行的for循环并检查条件。在这里,我正在利用干净的数据集。
此外,更改字段,使其不包含“/”或其他特殊字符。只要保留它们的顺序,字段的名称就没那么重要了。