Question

我有一个带有for循环的函数，例如返回一串字符串：

58，冥王星 172，Uno 5，桃子

如何将字符串的第一部分（数字）放在pandas数据框中的一列中，并将第二部分（水果）放在第二列中。这些列应分别命名为“数量”和“水果”。

这是到目前为止的代码：

.Include(s => s.DetalleTallas.Select(a => a.Talla.EstadoTallaa == false));

我正在使用re来从一大段文本中过滤出所需的数据，但现在它只是打印到控制台，需要将其放入数据框。

本质上，该代码中的最后一个print语句需要更改，因此我不是在打印插入数据帧中的打印。

最终文本示例为：

（a）梨地区58毫升/年（b）苹果地区64 ML /年

它是纯文本

Answer 1

必须努力为您找到一个更简单的解决方案。使用\ W正则表达式从您的字符串中删除（）\。

如果您的字符串模式始终保持不变

(x)## ML/Y in the fruit region (y) ## ML/Y in the fruit region

然后使用此代码。它将从列表中去除（）\，并为您提供一个更简单的列表。使用列表中的第3，第8，第13和第18位来获取所需的内容。

import pandas as pd
import re

finalText = '(a)58 ML/Y in the pear region (b) 64 ML/Y in the apple region'

df = pd.DataFrame(data=None, columns=['amount','fruit'])

for line in finalText.splitlines():
    matches = re.split(r'\W',line)
    df.loc[len(df)] = [matches[2],matches[7]]
    df.loc[len(df)] = [matches[12],matches[17]]

print(df)

此输出结果为：

  amount  fruit
0     58   pear
1     64  apple

另一种方法是使用findall。

for line in finalText.splitlines():
    print (line)
    m = re.findall(r'\w+',line)
    print (m)
    matches = re.findall(r'\w+',line)
    df.loc[len(df)] = [matches[1],matches[6]]
    df.loc[len(df)] = [matches[9],matches[14]]

print(df)

与上述结果相同

  amount  fruit
0     58   pear
1     64  apple

旧代码

尝试一下，让我知道它是否有效。

import pandas as pd

df = pd.DataFrame(data=None, columns=['amount','fruit'])

regex = r"(\d+)( ML/year )(in the |the )([\w \/\(\)]+)"
for line in finalText.splitlines():
    matches = re.finditer(pattern, line)

    for matchNum, match in enumerate(matches, start=1):
        df[matchNum] = [match.group(1) , match.group(4)]

Answer 2

这是我的解决方法

s = "58, pluto 172, uno 5, peaches"
temp = s.split() # ['58,', 'pluto', '172,', 'uno', '5,', 'peaches']
amount = temp[::2] #['58,', '172,', '5,']
fruit = temp[1::2] # ['pluto', 'uno', 'peaches']
df['amount'] = amount
df['fruit'] = fruit

您可以继续删除逗号并将类型从字符串更改为整数

如何用逗号分割字符串并将其插入熊猫数据框

2 个答案: