我如何使用scrapy re()选择器?

时间:2015-09-15 03:54:57

标签: python regex web-scraping scrapy

这是我的正则表达式:

go-with-me

这是我的测试字符串:

https://regex101.com/#python

我可以得到:for site in sites: title = sel.css("a::text").re(r".*\/(.*)\?ref") print title break
我确实试过int fat_chain_add(struct inode *inode, int new_dclus, int nr_cluster) { struct super_block *sb = inode->i_sb; struct msdos_sb_info *sbi = MSDOS_SB(sb); int ret, new_fclus, last; /* * We must locate the last cluster of the file to add this new * one (new_dclus) to the end of the link list (the FAT). */ last = new_fclus = 0; if (MSDOS_I(inode)->i_start) { int fclus, dclus; ret = fat_get_cluster(inode, FAT_ENT_EOF, &fclus, &dclus); if (ret < 0) return ret; new_fclus = fclus + 1; last = dclus; } /* add new one to the last of the cluster chain */ if (last) { struct fat_entry fatent; fatent_init(&fatent); ret = fat_ent_read(inode, &fatent, last); if (ret >= 0) { int wait = inode_needs_sync(inode); ret = fat_ent_write(inode, &fatent, new_dclus, wait); fatent_brelse(&fatent); } if (ret < 0) return ret; /* * FIXME:Although we can add this cache, fat_cache_add() is * assuming to be called after linear search with fat_cache_id. */ // fat_cache_add(inode, new_fclus, new_dclus); } else { MSDOS_I(inode)->i_start = new_dclus; MSDOS_I(inode)->i_logstart = new_dclus; /* * Since generic_write_sync() synchronizes regular files later, * we sync here only directories. */ if (S_ISDIR(inode->i_mode) && IS_DIRSYNC(inode)) { ret = fat_sync_inode(inode); if (ret) return ret; } else mark_inode_dirty(inode); } if (new_fclus != (inode->i_blocks >> (sbi->cluster_bits - 9))) { fat_fs_error(sb, "clusters badly computed (%d != %llu)", new_fclus, (llu)(inode->i_blocks >> (sbi->cluster_bits - 9))); fat_cache_inval_inode(inode); } inode->i_blocks += nr_cluster << (sbi->cluster_bits - 9); return 0; }

但我不知道如何用scrapy写, 它什么都没得到

这是我的代码:

int h = (q + (int)((13 * (m + 1)) / 5.0) + k + (int)(k / 4.0)
//                  ^^
       + (int)(j / 4.0) + (5 * j)) % 7;

1 个答案:

答案 0 :(得分:0)

如果没有看到真正的HTML输入数据,很难说,但您可能只需要查看href属性值而不是文本:

for site in sites:
    title = site.xpath(".//a/@href").re(r".*\/(.*)\?ref")
    print title
    break