Question

我有一个数据框，其列名为Text。此列的行均采用以下格式：

xxx - some sentence

其中xxx是一个随机数。我拥有的一个例子是：

      Text
100 - Hello World
200 - Bye World
300 - Good World

我希望python只查找字符串字符（“某些句子”）并将其替换为我指定的值。我当前使用的方法是：

mapping = {"100 - Hello World":"100 - Bonjour Le Monde"}
df = df.replace({"Text":mapping})

对于小型数据集效果很好，但是该数据集具有15k +条目和多个随机数。我宁愿不必每次都指定每个数字。如何告诉python查找字符串并仅翻译字符串？

非常感谢！

Answer 1

`regex=True`

mapping = {"Hello World": "Bonjour Le Monde"}
df.replace({"Text":mapping}, regex=True)

                     Text
0  100 - Bonjour Le Monde
1         200 - Bye World
2        300 - Good World

Answer 2

作为xxx - some sentence列中的值，它是一个完整的字符串。您需要的是仅翻译-之后的字符串部分。

为此，您可以使用自定义函数来完成这项工作，并使用apply在行上使用它。

def translating(txt):
    print(txt)
    return input()

def substituting(x):
    spv = [el.strip() for el in x['Text'].split('-')]
    tl = translating(spv[1])
    return ' - '.join([spv[0], tl])

ddf = df.apply(substituting, axis=1)
print(ddf)

translating是翻译功能。在这里，我打印了字符串，并要求用户在运行时进行替换，只是给您一个主意。如果您有15,000行，则可能需要使用字典或翻译器工具自动执行这种替换机制。

Answer 3

所以您有了自己的数据框：

df = pd.DataFrame({'Text': ['100 - Hello World', '200 - Bye World', '300 - Good World']})
df

Text
0   100 - Hello World
1   200 - Bye World
2   300 - Good World

您可以使用正则表达式提取列的两部分：

df = df['Text'].str.extractall(r'([0-9]+) - (.*)')

        0       1
    match       
0   0   100     Hello World
1   0   200     Bye World
2   0   300     Good World

您使用所有翻译内容创建一个数据框：

df_translate = pd.DataFrame({"en": ["Hello World", "Bye World", "Good World"], "fr": ["Bonjour Monde", "Au revoir le Monde", "Bon Monde"]})

    en              fr
0   Hello World     Bonjour Monde
1   Bye World       Au revoir le Monde
2   Good World      Bon Monde

您合并两个数据框并在列后创建您的列：

pd_res = pd.merge(df, df_translate, left_on=1, right_on='en', how='left')
pd_res['res'] = pd_res[0] + ' - ' + pd_res['fr']

    0       1               en              fr                  res
0   100     Hello World     Hello World     Bonjour Monde       100 - Bonjour Monde
1   200     Bye World       Bye World       Au revoir le Monde  200 - Au revoir le Monde
2   300     Good World      Good World      Bon Monde           300 - Bon Monde

我有一个混合的整数字符串列：如何仅更改字符串？

3 个答案:

`regex=True`