我有以下代码,它们逐行进行,并将特定的列从我的数据帧翻译成英语,但是当我运行它时,得到的新列为“ translatedv4”。我是遍历整个数据框而不是列表的新手,所以这可能是个问题
单个值的示例(我只希望该列显示“我正在考虑...”)
Comments Ich glaube das...
Translations DE
Race / Ethnicity White
Count2 91
translated I'm thinking this because I'm nearing retireme...
当前代码:
from googletrans import Translator
import pandas as pd
import xlsxwriter
import xlrd
import copy
##################TRANSLATION
translator = Translator()
file = r"xxxx"
#dt2 = translator.detect(text2)
df = pd.read_excel(file, sheet_name = 'Sheet1', converters={'Comments':str}).fillna(0)
df = df[df['Comments'] != 0]
translatedList = []
for index, row in df.iterrows():
# REINITIALIZE THE API
translator = Translator()
newrow = copy.deepcopy(row)
try:
# translate the 'text' column
translated = translator.translate(row['Comments'], dest='en')
newrow['translated'] = translated.text
except Exception as e:
print(str(e))
continue
translatedList.append(newrow)
df = df.assign(translatedv4 = translatedList)
答案 0 :(得分:0)
我不太确定您的问题,所以我希望这是您想要的。我确实认为您并没有以最佳方式接近它。通常,对于大熊猫,您将希望尝试对解决方案进行矢量化处理或创建要传递给df.apply
的函数。这是三种越来越复杂的解决方案。第一个使用了lambda函数,该函数可以运行,但不能处理异常。第二个函数创建一个正常的函数,使我们可以轻松地执行此操作。最后的解决方案ratelimit和tqdm在使用API和数据帧时很好用。
from googletrans import Translator
import pandas as pd
df = pd.DataFrame({
'German': ['ich glaube das', 'schadenfreude', 'schnappsidee']
})
translator = Translator()
df['English'] = df['German'].apply(
lambda sent: translator.translate(sent, dest='en', src='de').text
)
print(df)
German English
0 ich glaube das I believe that
1 schadenfreude malicious joy
2 schnappsidee snapping idea
from googletrans import Translator
import pandas as pd
def get_trans(sent):
try:
return translator.translate(sent, dest='en', src='de').text
except Exception as e:
print(e)
return np.nan
df = pd.DataFrame({
'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})
translator = Translator()
df['English'] = df['German'].apply(get_trans)
print(df)
'float' object is not iterable
German English
0 ich glaube das I believe that
1 schadenfreude malicious joy
2 schnappsidee snapping idea
3 NaN NaN
使用API时,我真的可以推荐出色的ratelimit库。它可以帮助您不要求太多请求,并处理异常。我还为进度条添加了tqdm。如果您有很多数据,这很好。
from googletrans import Translator
import pandas as pd
from ratelimit import limits, sleep_and_retry
from tqdm.autonotebook import tqdm
# from tqdm import tqdm <- use this instead if you're not using jupyter
FIFTEEN_MINUTES = 900
tqdm.pandas()
@sleep_and_retry
@limits(calls=15, period=FIFTEEN_MINUTES)
def get_trans(sent):
try:
return translator.translate(sent, dest='en', src='de').text
except Exception as e:
print(e)
return np.nan
df = pd.DataFrame({
'German': ['ich glaube das', 'schadenfreude', 'schnappsidee', np.nan]
})
translator = Translator()
df['English'] = df['German'].progress_apply(get_trans)
print(df)
German English
0 ich glaube das I believe that
1 schadenfreude malicious joy
2 schnappsidee snapping idea
3 NaN NaN
答案 1 :(得分:0)
我认为您的代码有一个小错误,在这里:
translatedList.append(newrow)
您要将整行附加到列表中,而您想附加新值,即
translatedList.append(translated.text)
但是要小心,如果有任何例外,那么TranslationList的长度将小于您的DataFrame索引的长度。可能您应该执行以下操作:
try:
# translate the 'text' column
translated = translator.translate(row['Comments'], dest='en')
translatedList.append(translated.text)
except Exception as e:
print(str(e))
translatedList.append('ERRROR')
continue