使用tidyr将行拆分为R中的列

时间:2016-02-07 23:25:18

标签: r tidyr

我有一个看起来像这样的数据集 -

                                                                              col1

1 ATOM      1  N   ILE A  12      67.611  47.640  52.312  1.00 12.44           N  
2 ATOM      2  CA  ILE A  12      66.381  47.660  51.520  1.00 25.25           C  

它有一个名为col1的列。我想分成12列,我正在使用以下命令 -

try=separate(subset,col1,c("name","S.No","Atom Name","Residue Name","Symbol","Residue Number","X-cor","Y-cor","Z-cor","Uk1","Uk2","Symbol"), sep= " ")

但我继续收到以下错误,我不明白 -

  

警告消息:3929个位置的值太多:1,2,3,4,5,6,   7,8,9,10,11,12,13,14,15,16,17,18,19,20 ......

它给了我以下输出 -

name S.No Atom Name Residue Name Symbol Residue Number X-cor Y-cor Z-cor Uk1 Uk2 Symbol

1 ATOM                                                       1           N            ILE

2 ATOM                                                       2          CA     ILE      A

任何帮助解决这个问题都非常感谢。谢谢!

2 个答案:

答案 0 :(得分:4)

public static String CONTACT_ID_URI = ContactsContract.Contacts._ID; public static String DATA_CONTACT_ID_URI = ContactsContract.Data.CONTACT_ID; public static String MIMETYPE_URI = ContactsContract.Data.MIMETYPE; public static String EMAIL_URI = ContactsContract.CommonDataKinds.Email.DATA; public static String PHONE_URI = ContactsContract.CommonDataKinds.Phone.DATA; public static String NAME_URI = (Build.VERSION.SDK_INT >= Build.VERSION_CODES.HONEYCOMB) ? ContactsContract.Data.DISPLAY_NAME_PRIMARY : ContactsContract.Data.DISPLAY_NAME; public static String PICTURE_URI = (Build.VERSION.SDK_INT >= Build.VERSION_CODES.HONEYCOMB) ? ContactsContract.Contacts.PHOTO_THUMBNAIL_URI : ContactsContract.Contacts.PHOTO_ID; public static String MAIL_TYPE = ContactsContract.CommonDataKinds.Email.CONTENT_ITEM_TYPE; public static String PHONE_TYPE = ContactsContract.CommonDataKinds.Phone.CONTENT_ITEM_TYPE; public Cursor getContactCursor(String stringQuery, String sortOrder) { Log.i(TAG, "+++++++++++++++++++++++++++++++++++++++++++++++++++"); Log.e(TAG, "ContactCursor search has started..."); Long t0 = System.currentTimeMillis(); Uri CONTENT_URI; if (stringQuery == null) CONTENT_URI = ContactsContract.Contacts.CONTENT_URI; else CONTENT_URI = Uri.withAppendedPath(ContactsContract.Contacts.CONTENT_FILTER_URI, Uri.encode(stringQuery)); String[] PROJECTION = new String[]{ CONTACT_ID_URI, NAME_URI, PICTURE_URI }; String SELECTION = NAME_URI + " NOT LIKE ?"; String[] SELECTION_ARGS = new String[]{"%" + "@" + "%"}; Cursor cursor = getContentResolver().query(CONTENT_URI, PROJECTION, SELECTION, SELECTION_ARGS, sortOrder); Long t1 = System.currentTimeMillis(); Log.e(TAG, "ContactCursor finished in " + (t1 - t0) / 1000 + " secs"); Log.e(TAG, "ContactCursor found " + cursor.getCount() + " contacts"); Log.i(TAG, "+++++++++++++++++++++++++++++++++++++++++++++++++++"); return cursor; } public Cursor getContactDetailsCursor() { Log.i(TAG, "+++++++++++++++++++++++++++++++++++++++++++++++++++"); Log.e(TAG, "ContactDetailsCursor search has started..."); Long t0 = System.currentTimeMillis(); String[] PROJECTION = new String[]{ DATA_CONTACT_ID_URI, MIMETYPE_URI, EMAIL_URI, PHONE_URI }; String SELECTION = NAME_URI + " NOT LIKE ?" + " AND " + "(" + MIMETYPE_URI + "=? " + " OR " + MIMETYPE_URI + "=? " + ")"; String[] SELECTION_ARGS = new String[]{"%" + "@" + "%", ContactsContract.CommonDataKinds.Email.CONTENT_ITEM_TYPE, ContactsContract.CommonDataKinds.Phone.CONTENT_ITEM_TYPE}; Cursor cursor = getContentResolver().query( ContactsContract.Data.CONTENT_URI, PROJECTION, SELECTION, SELECTION_ARGS, null); Long t1 = System.currentTimeMillis(); Log.e(TAG, "ContactDetailsCursor finished in " + (t1 - t0) / 1000 + " secs"); Log.e(TAG, "ContactDetailsCursor found " + cursor.getCount() + " contacts"); Log.i(TAG, "+++++++++++++++++++++++++++++++++++++++++++++++++++"); return cursor; } public List<ContactViewModel> getDetailedContactList(String queryString) { /** * First we fetch the contacts name and picture uri in alphabetical order for * display purpose and store these data in HashMap. */ Cursor contactCursor = getContactCursor(queryString, NAME_URI); if(contactCursor.getCount() == 0){ contactCursor.close(); return new ArrayList<>(); } List<Integer> contactIds = new ArrayList<>(); if (contactCursor.moveToFirst()) { do { contactIds.add(contactCursor.getInt(contactCursor.getColumnIndex(CONTACT_ID_URI))); } while (contactCursor.moveToNext()); } HashMap<Integer, String> nameMap = new HashMap<>(); HashMap<Integer, String> pictureMap = new HashMap<>(); int idIdx = contactCursor.getColumnIndex(CONTACT_ID_URI); int nameIdx = contactCursor.getColumnIndex(NAME_URI); int pictureIdx = contactCursor.getColumnIndex(PICTURE_URI); if (contactCursor.moveToFirst()) { do { nameMap.put(contactCursor.getInt(idIdx), contactCursor.getString(nameIdx)); pictureMap.put(contactCursor.getInt(idIdx), contactCursor.getString(pictureIdx)); } while (contactCursor.moveToNext()); } /** * Then we get the remaining contact information. Here email and phone */ Cursor detailsCursor = getContactDetailsCursor(); HashMap<Integer, String> emailMap = new HashMap<>(); HashMap<Integer, String> phoneMap = new HashMap<>(); idIdx = detailsCursor.getColumnIndex(DATA_CONTACT_ID_URI); int mimeIdx = detailsCursor.getColumnIndex(MIMETYPE_URI); int mailIdx = detailsCursor.getColumnIndex(EMAIL_URI); int phoneIdx = detailsCursor.getColumnIndex(PHONE_URI); String mailString; String phoneString; if (detailsCursor.moveToFirst()) { do { /** * We forget all details which are not correlated with the contact list */ if (!contactIds.contains(detailsCursor.getInt(idIdx))) { continue; } if(detailsCursor.getString(mimeIdx).equals(MAIL_TYPE)){ mailString = detailsCursor.getString(mailIdx); /** * We remove all double contact having the same email address */ if(!emailMap.containsValue(mailString.toLowerCase())) emailMap.put(detailsCursor.getInt(idIdx), mailString.toLowerCase()); } else { phoneString = detailsCursor.getString(phoneIdx); phoneMap.put(detailsCursor.getInt(idIdx), phoneString); } } while (detailsCursor.moveToNext()); } contactCursor.close(); detailsCursor.close(); /** * Finally the contact list is build up */ List<ContactViewModel> contacts = new ArrayList<>(); Set<Integer> emailsKeySet = emailMap.keySet(); Set<Integer> phoneKeySet = phoneMap.keySet(); for (Integer key : contactIds) { if( (!emailsKeySet.contains(key) && !phoneKeySet.contains(key)) || (emailMap.get(key) == null && phoneMap.get(key) == null) || mContactDB.isContactExisted(key)) { continue; } contacts.add(new ContactViewModel(key, nameMap.get(key), emailMap.get(key))); } return contacts; } 应该有更优雅的解决方案。但没有那个图书馆,这就是我所拥有的

tidyr

逻辑

我假设您的数据集名称为data.frame(do.call(rbind, unlist(apply(subset, 1, function(x) strsplit(x, "\\s+")),recursive=FALSE))) 。对于data.frame的每一行,您将其按空格分开,即此部分subset。其余的基本上是将它放在data.frame中。

更新

刚刚想出来,在您的代码中只需将strsplit(x, "\\s+"))替换为sep= " "即可。 sep= "\\s+"至少在空格上陈述,而你的\\s+恰好是一个空格。

答案 1 :(得分:0)

我遇到了同样的问题

解决方案: - 不要使用&#34; sep&#34;如果你想分割由&#34;。&#34;

连接的两个字符(或任何东西)

参考:检查单独()

文档中提供的示例
> df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
> df %>% separate(x, c("A", "B"))
  A    B
  1 <NA> <NA>
  2    a    b
  3    a    d
  4    b    c

#Reason for warning:

> x="Sepal.Width"
> strsplit(x,split=".")
[[1]]
[1] "" "" "" "" "" "" "" "" "" "" ""

> str_detect(x,".")
[1] TRUE
> str_replace(x,".","_")
[1] "_epal.Width"
> str_replace_all(x,".","_")
[1] "___________"