熊猫:新的列值包括整行

时间:2020-05-25 15:07:46

标签: python pandas dataframe

我有以下代码,它们逐行进行,并将特定的列从我的数据帧翻译成英语,但是当我运行它时,得到的新列为“ translatedv4”。我是遍历整个数据框而不是列表的新手,所以这可能是个问题

单个值的示例(我只希望该列显示“我正在考虑...”)

Comments            Ich glaube das...
Translations                                                       DE  
Race / Ethnicity                                                White
Count2                                                             91
translated          I'm thinking this because I'm nearing retireme...

当前代码:

from googletrans import Translator
import pandas as pd
import xlsxwriter
import xlrd
import copy

##################TRANSLATION

translator = Translator()
file = r"xxxx"
#dt2 = translator.detect(text2)

df = pd.read_excel(file, sheet_name = 'Sheet1', converters={'Comments':str}).fillna(0)

df = df[df['Comments'] != 0]


translatedList = []
for index, row in df.iterrows():
    # REINITIALIZE THE API
    translator = Translator()
    newrow = copy.deepcopy(row)
    try:
        # translate the 'text' column
        translated = translator.translate(row['Comments'], dest='en')
        newrow['translated'] = translated.text
    except Exception as e:
        print(str(e))
        continue
    translatedList.append(newrow)
df = df.assign(translatedv4 = translatedList) 

2 个答案:

答案 0 :(得分:0)

我不太确定您的问题,所以我希望这是您想要的。我确实认为您并没有以最佳方式接近它。通常,对于大熊猫,您将希望尝试对解决方案进行矢量化处理或创建要传递给df.apply的函数。这是三种越来越复杂的解决方案。第一个使用了lambda函数,该函数可以运行,但不能处理异常。第二个函数创建一个正常的函数,使我们可以轻松地执行此操作。最后的解决方案ratelimit和tqdm在使用API​​和数据帧时很好用。

解决方案1,没有异常处理程序

from googletrans import Translator
import pandas as pd

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee']
})

translator = Translator()

df['English'] = df['German'].apply(
    lambda sent: translator.translate(sent, dest='en', src='de').text
)

print(df)

           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea

解决方案2,带有异常处理程序

from googletrans import Translator
import pandas as pd

def get_trans(sent):
    try:
        return translator.translate(sent, dest='en', src='de').text
    except Exception as e:
        print(e)
        return np.nan

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})

translator = Translator()

df['English'] = df['German'].apply(get_trans)

print(df)

'float' object is not iterable
           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea
3             NaN             NaN

解决方案3,具有速率限制和tqdm

使用API​​时,我真的可以推荐出色的ratelimit库。它可以帮助您不要求太多请求,并处理异常。我还为进度条添加了tqdm。如果您有很多数据,这很好。

from googletrans import Translator
import pandas as pd
from ratelimit import limits, sleep_and_retry
from tqdm.autonotebook import tqdm
# from tqdm import tqdm  <- use this instead if you're not using jupyter

FIFTEEN_MINUTES = 900

tqdm.pandas()

@sleep_and_retry
@limits(calls=15, period=FIFTEEN_MINUTES)
def get_trans(sent):
    try:
        return translator.translate(sent, dest='en', src='de').text
    except Exception as e:
        print(e)
        return np.nan

df = pd.DataFrame({
    'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})

translator = Translator()

df['English'] = df['German'].progress_apply(get_trans)

print(df)

           German         English
0  ich glaube das  I believe that
1   schadenfreude   malicious joy
2    schnappsidee   snapping idea
3             NaN             NaN

答案 1 :(得分:0)

我认为您的代码有一个小错误,在这里:

translatedList.append(newrow)

您要将整行附加到列表中,而您想附加新值,即

translatedList.append(translated.text)

但是要小心,如果有任何例外,那么TranslationList的长度将小于您的DataFrame索引的长度。可能您应该执行以下操作:

try:
    # translate the 'text' column
    translated = translator.translate(row['Comments'], dest='en')
    translatedList.append(translated.text)
except Exception as e:
    print(str(e))
    translatedList.append('ERRROR')
    continue
相关问题