如何在Python中反转希伯来字符串?

时间:2012-09-19 23:19:39

标签: python

我正试图在Python中反转希伯来字符串:

line = 'אבגד'
reversed = line[::-1]
print reversed

但我明白了:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 0: ordinal not in range(128)

小心解释我做错了什么?

编辑:答案很棒,谢谢! 我还尝试使用以下命令将字符串保存到文件中:

w1 = open('~/fileName', 'w')
w1.write(reverseLine)

但现在我明白了:

return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-3: character    maps to <undefined>

任何想法如何解决这个问题?

编辑:找到解决方案,请参阅下面的答案。简而言之,我用了

codecs.open('~/fileName', 'w', encoding='utf-8') 

而不是

open('~/fileName', 'w')

6 个答案:

答案 0 :(得分:7)

在希伯来字符串前添加u对我有用:

In [1]: line = u'אבגד'

In [2]: reversed = line[::-1]

In [2]: print reversed
דגבא

关于第二个问题,您可以使用:

import codecs

w1 = codecs.open("~/fileName", "r", "utf-8")
w1.write(reversed)

将unicode字符串写入文件fileName

或者,在不使用codecs的情况下,您需要在写入文件时将reversed字符串编码为utf-8

with open('~/fileName', 'w') as f:
    f.write(reversed.encode('utf-8'))

答案 1 :(得分:4)

由于数字的顺序相反等,你需要的不仅仅是反转一个字符串来翻转希伯来语的后缀。

算法要复杂得多;

此页面中的所有答案(截至此日期)很可能会破坏您的号码和非希伯来语文本。

对于大多数情况,您应该使用

def weight_variable(shape, name):
  initial = tf.truncated_normal(shape, stddev=1.0, name=name)
  return tf.Variable(initial)

def bias_variable(shape, name):
  initial = tf.constant(1.0, shape=shape)
  return tf.Variable(initial, name=name)

input_file = pd.read_csv('P2R0PC0.csv') 
features = #vector with 5 feature names
targets = #vector with 4 feature names
x_data = input_file.as_matrix(features)
t_data = input_file.as_matrix(targets)

x = tf.placeholder(tf.float32, [None, x_data.shape[1]])

hiddenDim = 5

b1 = bias_variable([hiddenDim], name = "b1")
W1 = weight_variable([x_data.shape[1], hiddenDim], name = "W1")

b2 = bias_variable([t_data.shape[1]], name = "b2")
W2 = weight_variable([hiddenDim, t_data.shape[1]], name = "W2")

hidden = tf.nn.sigmoid(tf.matmul(x, W1) + b1)
y = tf.nn.sigmoid(tf.matmul(hidden, W2) + b2)
t = tf.placeholder(tf.float32, [None, t_data.shape[1]])

lambda1 = 1
beta1 = 1
lambda2 = 1
beta2 = 1
error = -tf.reduce_sum(t * tf.log(tf.clip_by_value(y,1e-10,1.0)) + (1 - t) * tf.log(tf.clip_by_value(1 - y,1e-10,1.0)))
complexity = lambda1 * tf.nn.l2_loss(W1) + beta1 * tf.nn.l2_loss(b1) + lambda2 * tf.nn.l2_loss(W2) + beta2 * tf.nn.l2_loss(b2)
loss = error + complexity

train_step = tf.train.AdamOptimizer(0.001).minimize(loss)
sess = tf.Session()

init = tf.initialize_all_variables()
sess.run(init)

ran = 25001
delta = 250

plot_data = np.zeros(int(ran / delta + 1))
k = 0;
for i in range(ran):
    train_step.run({x: data, t: labels}, sess)
    if i % delta == 0:
        plot_data[k] = loss.eval({x: data, t: labels}, sess)
        #plot_training[k] = loss.eval({x: x_test, t: t_test}, sess)
        print(str(plot_data[k]))
        k = k + 1

plt.plot(np.arange(start=2, stop=int(ran / delta + 1)), plot_data[2:])

saver = tf.train.Saver()
saver.save(sess, "params.ckpt")

error.eval({x:data, t: labels}, session=sess)

答案 2 :(得分:2)

您需要使用unicode字符串常量:

line = u'אבגד'
reversed = line[::-1]
print reversed

答案 3 :(得分:1)

字符串默认被视为ascii。使用u''作为unicode

line = u'אבגד'
reversed = line[::-1]
print reversed

答案 4 :(得分:1)

确保您使用的是unicode对象

line = unicode('אבגד', 'utf-8')
reversed = line[::-1]
print reversed

答案 5 :(得分:0)

找到了如何写入文件:

w1 = codecs.open('~/fileName', 'w', encoding='utf-8')
w1.write(reverseLine)