如何删除行中具有不同值的行

时间:2019-06-13 01:17:40

标签: pyspark

我有一个数据框,我想删除数据框中不相关值的两行

+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+--------------------+
|                 url|             address|                name|        online_order|          book_table|         rate|               votes|
+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+--------------------+
|https://www.zomat...|27th Main, 2nd Se...|Rock View Family ...|                 Yes|                  No|        3.3/5|                   8|
|https://www.zomat...|1152, 22nd Cross,...|           OMY Grill|                 Yes|                  No|        3.8/5|                  34|
|xperience this pl...|        ('Rated 3.0'| 'RATED\n  Yummy ...| I got choco chip...|strangely they di...| ('Rated 3.0'|" """"RATED\n  Th...|
|ing new to Bangalore|Chip����\x83����\...| ����\x83����\x83...| I was really hap...| Service was quic...| ('Rated 4.0'| 'RATED\n  Visite...|
|https://www.zomat...|1086/A, Twin Tuli...|          Wings Mama|                 Yes|                  No|         null|                   0|
+--------------------+--------------------+--------------------+--------------------+--------------------+-------------+--------------------+

并且删除行后我的数据框看起来像这样

+--------------------+--------------------+--------------------+------------+----------+-----+-----+
|                 url|             address|                name|online_order|book_table| rate|votes|
+--------------------+--------------------+--------------------+------------+----------+-----+-----+
|https://www.zomat...|27th Main, 2nd Se...|Rock View Family ...|         Yes|        No|3.3/5|    8|
|https://www.zomat...|1152, 22nd Cross,...|           OMY Grill|         Yes|        No|3.8/5|   34|
|https://www.zomat...|1086/A, Twin Tuli...|          Wings Mama|         Yes|        No| null|    0|
+--------------------+--------------------+--------------------+------------+----------+-----+-----+

1 个答案:

答案 0 :(得分:0)

使用正则表达式过滤掉行的最佳方法

expr = "^http.*"
df=df.filter(df["url"].rlike(expr))