我正在使用pandas.read_csv(path,low_memory = False)将较大的csv文件读取到内存中 我想逐行提取某些行组并将其插入数据库中。 我知道第11到62行进入一个表,第65到10000行进入另一表 有没有一种方法可以从数据帧中获取行的子集以单独循环。如果行的元素2不是nan,我也只需要处理子集中的数据。 谢谢您的帮助
答案 0 :(得分:0)
针对您的问题,有两种解决方案。从pandas read_csv documentation
skiprows
library(data.table) setDT(data)[,lapply(colnames(.SD),function(x) { y <- tstrsplit(.SD[[x]],";") setNames(as.data.table(y),paste0(paste0(x,"."),1:length(y))) }), .SDcols = setdiff(names(data),"id")] Q6.1 Q6.2 Q6.3 Q7.1 Q7.2 Q7.3 1: apple orange blueberry spinich kale <NA> 2: orange blueberry <NA> kale spinich <NA> 3: apple <NA> <NA> kale <NA> <NA> 4: peach apple <NA> cauliflower <NA> <NA> 5: orange blueberry peach kale spinich cauliflower 6: peach <NA> <NA> spinich kale cauliflower 7: apple orange blueberry potato kale <NA> 8: orange blueberry peach potato spinich cauliflower 9: apple peach <NA> none <NA> <NA> 10: apple <NA> <NA> none <NA> <NA>
跳过脚
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].
成长
Number of lines at bottom of file to skip (Unsupported with engine=’c’).
最直观的解决方案是
Number of rows of file to read. Useful for reading pieces of large files.
但是您当然也可以去
df1 = pd.read_csv(path, low_memory=False, skiprows=65, nrows=10000-65)
答案 1 :(得分:0)
您可以简单地使用:
dataframe_name['column_name'] (conditions) (value)
示例:
dataframe['row_num'] > 200