我想知道的是如何使用带有正则表达式的上述数据框来按正确的顺序放置数据行。 如您从索引2和4所看到的,数量和件的顺序错误。 有谁有想法可以解决这个问题?
data = [['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4'],['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4']]
df = pd.DataFrame(data, columns = ['Information'])
df
+-------+--------------------------------------+
| index | Information |
+-------+--------------------------------------+
| 0 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 1 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 2 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 3 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 4 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 5 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 6 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 7 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 8 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 9 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
+-------+--------------------------------------+
dt = pd.DataFrame(df)
data = []
for item in dt['Information']:
regex = re.findall(r"(\d+)\D+(\d+)\D+(\d+)",item)
quantity = re.findall(r"\bTotal\s?\d\D+(\bQuantity)",item)
piece = re.findall(r"\bTotal\s?\d\D+(\bPiece)",item)
regex = (map(list,regex))
data.append(list(map(int,list(regex)[0])))
dftotal = pd.DataFrame(data, columns=['Total','Quantity','Piece'])
print(dftotal)
有了这段代码,我得到了下面的一列
+-------+----------+-------+
| Total | Quantity | Piece |
+-------+----------+-------+
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
+-------+----------+-------+
如何通过从“数据数组”中切换这些错误的顺序来获得如下所示的数据框,并将正确的变量放在单个数据框中?
+-------+----------+-------+
| Total | Quantity | Piece |
+-------+----------+-------+
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
+-------+----------+-------+
答案 0 :(得分:2)
这是使用str.extract
例如:
import pandas as pd
data = [['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4'],['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4']]
df = pd.DataFrame(data, columns = ['Information'])
df["Total"] = df["Information"].str.extract(r"Total (\d+)")
df["Quantity"] = df["Information"].str.extract(r"Quantity (\d+)")
df["Piece"] = df["Information"].str.extract(r"Piece (\d+)")
df.drop("Information", inplace=True, axis=1)
print(df)
输出:
Total Quantity Piece
0 8 2 4
1 8 2 4
2 8 4 2
3 8 2 4
4 8 4 2
5 8 2 4
6 8 2 4
7 8 4 2
8 8 2 4
9 8 4 2
答案 1 :(得分:1)
实际上,原始数据接近于csv文件,其中的分隔符为空格。以这种方式加载数据后,旋转数据就足以获得所需的数据。
所以我会这样做:
df = pd.read_csv(io.StringIO('\r\r\n'.join((line[0] for line in data))),
sep=' ', header=None)
df['n'] = (df.index / 3).astype(np.int32)
result = df.pivot('n', 0, 1)
结果是以下数据框:
0 Piece Quantity Total
n
0 4 2 8
1 4 2 8
2 2 4 8
3 4 2 8
4 2 4 8
5 4 2 8
6 4 2 8
7 2 4 8
8 4 2 8
9 2 4 8