Question

我需要一些数据整理方面的帮助。我在Facebook Messenger上下载了与某人的对话，但输出如下：

apply plugin: 'com.android.application'

android {
    compileSdkVersion 26
    defaultConfig {
        applicationId "com.example.asus.apptest"
        minSdkVersion 19
        targetSdkVersion 26
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
    }
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }
}

dependencies {
    implementation fileTree(dir: 'libs', include: ['*.jar'])

    implementation 'com.android.support:appcompat-v7:26.1.0'
    implementation 'com.android.support.constraint:constraint-layout:1.1.2'

    implementation 'com.android.support:support-v4:26.1.0'



    implementation 'com.google.firebase:firebase-core:16.0.1'
    implementation 'com.google.firebase:firebase-database:16.0.1'
    implementation 'com.google.firebase:firebase-storage:16.0.1'
    implementation 'com.google.firebase:firebase-auth:16.0.1'
    implementation 'com.firebaseui:firebase-ui-database:4.0.1'
    implementation 'com.firebaseui:firebase-ui-auth:4.0.1'


    testImplementation 'junit:junit:4.12'
    androidTestImplementation 'com.android.support.test:runner:1.0.2'
    androidTestImplementation 'com.android.support.test.espresso:espresso-core:3.0.2'

    implementation 'de.hdodenhof:circleimageview:2.2.0'
    implementation 'com.theartofdev.edmodo:android-image-cropper:2.7.+'
    implementation 'com.squareup.picasso:picasso:2.71828'


}
apply plugin: 'com.google.gms.google-services'

它们全都放在一列中，但是我试图制作一个数据框，其中说话者在一列中，消息在另一列中，而日期在另一列中。我面临的问题是，有时消息会分成两行，所以我不能仅将整个列分为三列。最好的解决方案是什么？感谢对此的任何帮助：）

Answer 1

由于您的输入始终为“人A”（或人B），并以日期结尾，格式为YYYY-MM-DD HH：MM，因此我将使用正则表达式：

library(stringr)
date_match="\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}"
col_a=str_match_all(chat_messenger,
                    paste0("(?<=\n|^)Person A\\s*\n([\\s\\S]*?)\n",date_match, sep="")
                    )[[1]][,2]
col_b=str_match_all(chat_messenger,
                    paste0("(?<=\n)Person B\\s*\n([\\s\\S]*?)\n",date_match, sep="")
)[[1]][,2]
col_a
col_b

给出以下结果：

> col_a
[1] "Coolcool  "                                      "You called Person B   \nDuration: 30 seconds   "
[3] "Hey!   "                                        
> col_b
[1] "See you later  \n:D  " ".  \nWhat's up?   "

为了更好地了解正则表达式匹配：我将划分这一行：（？<= \ n | ^）人员A \ s * \ n（[\ s \ S] *？）\ n

(?<=\n|^)正在查找以空格或文档开头开头的内容，以防您在聊天中使用“ Person A”一词。
Person A\\s*\n：查找名称，后跟空格（至少0）和换行符
([\\s\\S]*?)：提取所有内容，包括换行符
\n：换行前停止提取

数据整理帮助：将所有数据拆分为一列

1 个答案: