我的.txt文件中有数据
productname1
7,64
productname2
6,56
4.73
productname3
productname4
12.58
10.33
所以这里解释数据。我们在名字中有产品名称,在第二行有价格。但对于第二个产品名称,我们有原始产品价格和折扣价格。此外,价格有时包含'。'和','代表美分。我想以下列方式格式化数据
Product o_price d_price
productname1 7.64 -
productname2 6.56 4.73
productname3 - -
productname4 12.58 10.33
我目前的方法有点幼稚,但它适用于98%的案例
import pandas as pd
data = {}
tempKey = []
with open("myfile.txt", encoding="utf-8") as file:
arr_content = file.readlines()
for val in arr_content:
if not val[0].isdigit():# check whether Starting letter is a digit or text
val = ' '.join(val.split()) # Remove extra spaces
data.update({val: []}) # Adding key to the dict and initializing it with a list in which I'll populate values
tempKey.append(val) # keeping track of the last key added because dicts are not sequential
else:
data[str(tempKey[-1])].append(val) # Using last added key and updating it with prices
df = pd.DataFrame(list(data.items()), columns = ['Product', 'Pricelist'])
df[['o_price', 'd_price']] = pd.DataFrame([x for x in df.Pricelist])
df = df.drop('Prices', axis=1)
因此,当产品名称以数字开头时,此技术不起作用。有关更好方法的任何建议吗?
答案 0 :(得分:0)
使用正则表达式检查该行是否仅包含数字和/或句点。
if (re.match("^[0-9\.]*$", val)):
# This is a product price
else:
# This is a product name