Question

[已使用某些数据和试用代码更新]

我有一些孟加拉语unicode格式的数字。要用数据开发一些图形。但是R会将数据读取为“字符”而不是“数字”。如何使R将数据读取为“数字”。谢谢。

我的数据表如下：

"সংখ্যা"    "বছর"
৩৪,৭০৮    ২০১১
৩২,৮১০    ২০১২
৩২,৮৯৪    ২০১৪

我已使用as.numeric函数将两列都转换为数字：

mb$`“সংখ্যা”` <- as.numeric(mb$`“সংখ্যা”`)
mb$`“বছর”` <- as.numeric(mb$`“বছর”`)

该类已转换为警告：

Warning message:
NAs introduced by coercion

然后我尝试绘制条形图：

ggplot(mb, aes("বছর", "সংখ্যা"))+
geom_bar(stat = "identity", width=0.3)

结果如下： enter image description here

遵循Rohit的代码，然后尝试绘制条形图：

ggplot(mb, aes(x="বছর", y="সংখ্যা"))+
geom_bar(stat = "identity")

它不起作用，图像链接： enter image description here

然后用英语绘制一个数据图，效果很好：

ggplot(mbe, aes(x=year, y=number))+
geom_bar(stat = "identity")

绘制图像：enter image description here

任何观察/建议吗？

Answer 1

您首先需要将数字转换为等效的英语/拉丁语。您可以使用stringi库执行此操作。然后，您可以使用as.numeric()

将其更改为数字

n <- '১০৫'
library(purrr)
library(stringi)
n %>% stri_trans_general('Bengali-Latin') %>%as.numeric()
# [1] 105

编辑：对于您提供的数据，您可以执行以下操作：

    mb
    # সংখ্যা  বছর
    # 1 ৩৪,৭০৮ ২০১১
    # 2 ৩২,৮১০ ২০১২
    # 3 ৩২,৮৯৪ ২০১৪
    library(dplyr)
    library(stringi)
    mb <- mb %>%
      mutate_all(function(x){ # mutate_all will apply the function to all columns of mb
        x %>%
          stri_trans_general('Bengali-Latin') %>% # convert to latin charset
          gsub(pattern = ',',replacement = '')%>% # Commas need to be removed
          as.numeric()
      })
    # সংখ্যা  বছর
    # 1 34708 2011
    # 2 32810 2012
    # 3 32894 2014

编辑：对于图解，您的列名称带有孟加拉语和引号，因此需要将其括在反引号中：

ggplot(mb,aes(`"বছর"`,`"সংখ্যা"`))+
  geom_col()

R中的孟加拉语数据输入

1 个答案: