check if the data is correct in database

woman lying under the leaves during daytime

I have a database of emails. like below, i want to filter out those emails are not correct.
for eg:

  1. if email is not having “.”
  2. if email have more than one “@”
  3. if email have more than one “.” before and after “@”
  4. if email have spaces inside email or outside email.
  5. if email have domain other than “” like (,

please help me like this if in future i will found anything to amend than i can add more conditions.

df <- data.frame(email=c("[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]",
                   "[email protected]","[email protected]","[email protected]","[email protected]","[email protected]"))

for example the output be like

Emails require complex regular expressions to parse to account for almost all possible cases, such as


See RFC5322; see also this S/O

Starting at step 5 in the OP reduces the complexity, however, and makes the other tests in the OP unnecessary


df <- data.frame(email=c("[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]"))

is_gmail  <- ""

df %>% filter(str_detect(email,is_gmail))
#>            email
#> 1  [email protected]
#> 2  [email protected]
#> 3  [email protected]
#> 4  [email protected]
#> 5  [email protected]
#> 6  [email protected]
#> 7  [email protected]
#> 8  [email protected]
#> 9  [email protected]
#> 10 [email protected]
#> 11 [email protected]

Created on 2020-08-27 by the reprex package (v0.3.0)

Latest posts