我相信我的问题与最佳实践一样多,因为它是关于整理凌乱的数据,所以这里就是这样。
以下是数据框lang.df
的摘录,这是学校范围内的学生数据集。列Langauge.Home
表示父对此问题的回答:
“你在家里说什么语言?”
> lang.df
Nationality Language.Home
1 HK Mandarin
2 German Mandarin/English/German
3 Saudi Arabic
4 Norwegian Norwegian
5 UK English
6 HK Mandarin/ Min Nan dialect
7 Australian Mandarin
8 HK Mandarin
9 Brazilian Portuguese/English
10 Indian Hindi/English
对我来说很明显,这是获取此信息的一种糟糕方式,也是一种存储方式很差的方法,但我的工作是使用我拥有的数据。
结果
我想探讨某些家庭语言可能对成就产生的影响。我需要的是能够通过在家里说的单一语言(例如在家里说英语的学生)进行分组。
为此,我似乎必须使用dplyr的Language@home
将"language.home1", "language.home2", "language.home3"
列分成三个(separate()
)。为我创建的新列中的每个唯一值(即语言)创建一个新列
过程
以下是我有效地完成上述操作的尝试
library(dplyr)
library(tidyr)
#separate Langauge.Home into three new columns
lang.df <- lang.df %>% separate(Language.Home,
c("language.home1", "language.home2", "language.home3"),
sep = "/",
remove = FALSE)
#find distinct languages & remove NAs
langs <- unique(c(lang.df$language.home1,
lang.df$language.home2,
lang.df$language.home3))
langs <- langs[!is.na(langs)]
#create boolean column for each unique language in new columns
for (i in langs) {
lang.df[,paste(i)] <- grepl(i, lang.df$Language.Home)
}
问题
tidyr
文档,并在此处查看但未找到任何相关信息。 提前感谢您的帮助。我现在只使用R开关大约一年了,这是我的第一篇SO帖子。给我尽可能多的反馈意见!
数据
lang.df <- structure(list(Nationality = structure(c(4L, 3L, 7L, 6L, 8L,
4L, 1L, 4L, 2L, 5L), .Label = c("Australian", "Brazilian", "German",
"HK", "Indian", "Norwegian", "Saudi", "UK"), class = "factor"),
`Language.Home` = structure(c(4L, 6L, 1L, 7L, 2L, 5L, 4L,
4L, 8L, 3L), .Label = c("Arabic", "English", "Hindi/English",
"Mandarin", "Mandarin/ Min Nan dialect", "Mandarin/English/German",
"Norwegian", "Portuguese/English"), class = "factor")), row.names = c(NA,
10L), .Names = c("Nationality", "Language.Home"), class = "data.frame")
答案 0 :(得分:5)
我们可以使用struct qt_meta_stringdata_vk__VNode_t {
QByteArrayData data[1];
char stringdata0[10];
};
#define QT_MOC_LITERAL(idx, ofs, len) \
Q_STATIC_BYTE_ARRAY_DATA_HEADER_INITIALIZER_WITH_OFFSET(len, \
qptrdiff(offsetof(qt_meta_stringdata_vk__VNode_t, stringdata0) + ofs \
- idx * sizeof(QByteArrayData)) \
)
static const qt_meta_stringdata_vk__VNode_t qt_meta_stringdata_vk__VNode = {
{
QT_MOC_LITERAL(0, 0, 9) // "vk::VNode"
},
"vk::VNode"
};
#undef QT_MOC_LITERAL
static const uint qt_meta_data_vk__VNode[] = {
// content:
7, // revision
0, // classname
0, 0, // classinfo
0, 0, // methods
0, 0, // properties
0, 0, // enums/sets
0, 0, // constructors
0, // flags
0, // signalCount
0 // eod
};
void vk::VNode::qt_static_metacall(QObject *_o, QMetaObject::Call _c, int _id, void **_a)
{
Q_UNUSED(_o);
Q_UNUSED(_id);
Q_UNUSED(_c);
Q_UNUSED(_a);
}
const QMetaObject vk::VNode::staticMetaObject = {
{ &QObject::staticMetaObject, qt_meta_stringdata_vk__VNode.data,
qt_meta_data_vk__VNode, qt_static_metacall, Q_NULLPTR, Q_NULLPTR}
};
中的VNode::staticMetaObject
分割语言。家庭&#39;使用分隔符qt_meta_stringdata_vk__VNode
并将其转换为qt_meta_data_vk__VNode
格式。
void VScene::markActiveObject(VObject *obj)
{
if (obj){
obj->markActive();
emit activeObjectChanged(obj);
}
}
然后,使用var myStr2 = str.replace(/'/g, "\\'");
console.log(myStr2); // gives O\'neil
转换来自&#39; long&#39;广泛&#39;
public class Receiver extends BroadcastReceiver {
@Override
public void onReceive(Context context, Intent intent) {
// Check if the application is install or uninstall and display the message accordingly
if (intent.getAction().equals("android.intent.action.PACKAGE_ADDED")) {
// Application Install
Log.e("Package Added:-", intent.getData().toString());
} else if (intent.getAction().equals("android.intent.action.PACKAGE_REMOVED")) {
Log.e("Package Removed:-", intent.getData().toString());
} else if (intent.getAction().equals("android.intent.action.PACKAGE_REPLACED")) {
Log.e("Package Replaced:-", intent.getData().toString());
}
}
注意:有重复的国籍&#39;行,所以上面将公共元素组合在一起。将它组合在一起可能更好。
如果我们需要根据每一行设置逻辑列(不论类似的国籍&#39;)
<receiver android:name=".Receiver">
<intent-filter android:priority="100">
<action android:name="android.intent.action.PACKAGE_INSTALL"/>
<action android:name="android.intent.action.PACKAGE_ADDED"/>
<action android:name="android.intent.action.PACKAGE_REMOVED"/>
<data android:scheme="package"/>
</intent-filter>
</receiver>
分割语言后,cSplit
的其他选项为splitstackshape
。 /
。
long
答案 1 :(得分:3)
获得长篇形式的一种简单方法是tidyr::unnest()
:
library(dplyr)
library(tidyr)
library(stringr)
lang.df %>%
mutate(Language.Home = str_split(Language.Home, "/")) %>%
unnest()
#> Nationality Language.Home
#> 1 HK Mandarin
#> 2 German Mandarin
#> 3 German English
#> 4 German German
#> 5 Saudi Arabic
#> 6 Norwegian Norwegian
#> 7 UK English
#> 8 HK Mandarin
#> 9 HK Min Nan dialect
#> 10 Australian Mandarin
#> 11 HK Mandarin
#> 12 Brazilian Portuguese
#> 13 Brazilian English
#> 14 Indian Hindi
#> 15 Indian English
答案 2 :(得分:2)
这是一个基本方法,总共只有几行
lang.df <- structure(list(Nationality = structure(c(4L, 3L, 7L, 6L, 8L, 4L, 1L, 4L, 2L, 5L), .Label = c("Australian", "Brazilian", "German", "HK", "Indian", "Norwegian", "Saudi", "UK"), class = "factor"), `Language.Home` = structure(c(4L, 6L, 1L, 7L, 2L, 5L, 4L, 4L, 8L, 3L), .Label = c("Arabic", "English", "Hindi/English", "Mandarin", "Mandarin/ Min Nan dialect", "Mandarin/English/German", "Norwegian", "Portuguese/English"), class = "factor")), row.names = c(NA, 10L), .Names = c("Nationality", "Language.Home"), class = "data.frame")
第二部分:新数据框,每种语言分成不同的列并按顺序标记
dd <- read.table(text = gsub('/\\s*', ';', lang.df$Language.Home),
sep = ';', na.strings = '', fill = TRUE, as.is = TRUE,
col.names = paste0('lang.home', 1:3))
# lang.home1 lang.home2 lang.home3
# 1 Mandarin <NA> <NA>
# 2 Mandarin English German
# 3 Arabic <NA> <NA>
# 4 Norwegian <NA> <NA>
# 5 English <NA> <NA>
# 6 Mandarin Min Nan dialect <NA>
# 7 Mandarin <NA> <NA>
# 8 Mandarin <NA> <NA>
# 9 Portuguese English <NA>
# 10 Hindi English <NA>
第三部分:每种独特语言的逻辑指标
lang <- na.omit(sort(unique(unlist(dd))))
idx <- `colnames<-`(t(apply(dd, 1, function(x) lang %in% x)), lang)
# Arabic English German Hindi Mandarin Min Nan dialect Norwegian Portuguese
# [1,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [2,] FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
# [3,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# [4,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# [5,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# [6,] FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
# [7,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [8,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# [9,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
# [10,] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
结合三个部分:
cbind(lang.df, dd, idx)
# Nationality Language.Home lang.home1 lang.home2 lang.home3 Arabic English German Hindi Mandarin Min Nan dialect Norwegian Portuguese
# 1 HK Mandarin Mandarin <NA> <NA> FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# 2 German Mandarin/English/German Mandarin English German FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
# 3 Saudi Arabic Arabic <NA> <NA> TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 4 Norwegian Norwegian Norwegian <NA> <NA> FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# 5 UK English English <NA> <NA> FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# 6 HK Mandarin/ Min Nan dialect Mandarin Min Nan dialect <NA> FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
# 7 Australian Mandarin Mandarin <NA> <NA> FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# 8 HK Mandarin Mandarin <NA> <NA> FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
# 9 Brazilian Portuguese/English Portuguese English <NA> FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
# 10 Indian Hindi/English Hindi English <NA> FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE