Clean up parameter list, update docs

See also discussion at
https://github.com/karolyi/py3-validate-email/discussions/57
This commit is contained in:
Reinhard Müller 2021-03-12 01:06:13 +01:00
parent 1b9b0682cd
commit da540d8db2
10 changed files with 182 additions and 67 deletions

View File

@ -1,12 +1,21 @@
1.0.0:
- New major release with breaking changes! They are:
- Parameter names for validate_email() and validate_email_or_fail() have changed:
- check_regex -> check_format
- use_blacklist -> check_blacklist
- check_mx -> check_dns
- skip_smtp -> check_smtp (with inverted logic)
- helo_host -> smtp_helo_host
- from_address -> smtp_from_address
- debug -> smtp_debug
- All parameters except for the first one (the email address to check) are now keyword-only.
- Ambiguous results and the possibility of more of them, to reflect a real world SMTP delivery process:
- The module will keep trying probing through all MX hosts for validation and emit errors in the end of the full probing procedure.
- Any acceptance of the email delivery will be marked as valid, despite any other ambigious or negative result(s).
- The validate_email_or_fail() function will raise an SMTPCommunicationError() on a denied email address only in the end.
- The validate_email_or_fail() function will now raise an SMTPTemporaryError() on an ambiguous result. That is, greylisting or no servers providing a definitive negative or positive.
- A server that bails out with a 4xx code at any part of the SMTP conversation, will be marked as ambiguous.
- Both of the aforementioned exceptions will contain the occurred communication results in their error_messages class variables.
- The module tries all MX hosts in order of priority.
- An acceptance of the email address will yield a positive verification result, no further MX hosts will be tried.
- Any permanent SMTP error (5xx) will yield a negative verification result, no further MX hosts will be tried.
- Any temporary SMTP error (4xx) or any connection issue will cause the next MX host to be tried. Only if all MX hosts yield these kinds of errors, the overall verification result will be ambiguous. That is, greylisting or no servers providing a definitive negative or positive.
- The validate_email_or_fail() function will now raise an SMTPTemporaryError() on an ambiguous result.
- All exceptions raised by the SMTP check will contain the occurred communication results in their error_messages class variables.
- Internal API changes (refactorings)
- Check results are now logged with info level, instead of emitting warnings when debug is turned on.
- Props to @reinhard-mueller for coming up with the new proposals and helping in refining the idea.

2
FAQ.md
View File

@ -36,7 +36,7 @@ Run this code with the module installed (use your parameters within),
and see the output:
```python
python -c 'import logging, sys; logging.basicConfig(stream=sys.stderr, level=logging.DEBUG); from validate_email import validate_email; print(validate_email(\'your.email@address.com\', check_mx=True, debug=True))'
python -c 'import logging, sys; logging.basicConfig(stream=sys.stderr, level=logging.DEBUG); from validate_email import validate_email; print(validate_email(\'your.email@address.com\', smtp_debug=True))'
```
If you still don't understand why your code doesn't work as expected by

View File

@ -25,32 +25,135 @@ USAGE
Basic usage::
from validate_email import validate_email
is_valid = validate_email(email_address='example@example.com', check_regex=True, check_mx=True, from_address='my@from.addr.ess', helo_host='my.host.name', smtp_timeout=10, dns_timeout=10, use_blacklist=True, debug=False)
is_valid = validate_email(email_address='example@example.com', check_format=True, check_blacklist=True, check_dns=True, dns_timeout=10, check_smtp=True, smtp_timeout=10, smtp_helo_host='my.host.name', smtp_from_address='my@from.addr.ess', smtp_debug=False)
:code:`check_regex` will check will the email address has a valid structure and defaults to True
Parameters
----------------------------
:code:`check_mx`: check the mx-records and check whether the email actually exists
:code:`email_address`: the email address to check
:code:`from_address`: the email address the probe will be sent from
:code:`check_format`: check whether the email address has a valid structure; defaults to :code:`True`
:code:`helo_host`: the host to use in SMTP HELO when checking for an email
:code:`check_blacklist`: check the email against the blacklist of domains downloaded from https://github.com/martenson/disposable-email-domains; defaults to :code:`True`
:code:`smtp_timeout`: seconds until SMTP timeout
:code:`check_dns`: check the DNS mx-records, defaults to :code:`True`
:code:`dns_timeout`: seconds until DNS timeout; defaults to 10 seconds
:code:`dns_timeout`: seconds until DNS timeout
:code:`check_smtp`: check whether the email actually exists by initiating an SMTP conversation; defaults to :code:`True`
:code:`use_blacklist`: use the blacklist of domains downloaded from https://github.com/martenson/disposable-email-domains
:code:`smtp_timeout`: seconds until SMTP timeout; defaults to 10 seconds
:code:`debug`: emit debug/warning messages while checking email
:code:`smtp_helo_host`: the hostname to use in SMTP HELO/EHLO; if set to :code:`None` (the default), the fully qualified domain name of the local host is used
:code:`skip_smtp`: (default :code:`False`) skip the SMTP conversation with the server, after MX checks. Will automatically be set to :code:`True` when :code:`check_mx` is :code:`False`!
:code:`smtp_from_address`: the email address used for the sender in the SMTP conversation; if set to :code:`None` (the default), the :code:`email_address` parameter is used as the sender as well
:code:`smtp_debug`: activate :code:`smtplib`'s debug output which always goes to stderr; defaults to :code:`False`
Result
----------------------------
The function :code:`validate_email()` returns the following results:
:code:`True`
All requested checks were successful for the given email address.
:code:`False`
At least one of the requested checks failed for the given email address.
:code:`None`
None of the requested checks failed, but at least one of them yielded an ambiguous result. Currently, the SMTP check is the only check which can actually yield an ambigous result.
Getting more information
----------------------------
The function :code:`validate_email_or_fail()` works exactly like :code:`validate_email`, except that it raises an exception in the case of validation failure and ambiguous result instead of returning :code:`False` or :code:`None`, respectively.
All these exceptions descend from :code:`EmailValidationError`. Please see below for the exact exceptions raised by the various checks. Note that all exception classes are defined in the module :code:`validate_email.exceptions`.
Please note that :code:`SMTPTemporaryError` indicates an ambigous check result rather than a check failure, so if you use :code:`validate_email_or_fail()`, you probably want to catch this exception.
The checks
============================
By default, all checks are enabled, but each of them can be disabled by one of the :code:`check_...` parameters. Note that, however, :code:`check_smtp` implies :code:`check_dns`.
:code:`check_format`
----------------------------
Check whether the given email address conforms to the general format requirements of valid email addresses.
:code:`validate_email_or_fail()` raises :code:`AddressFormatError` on any failure of this test.
:code:`check_blacklist`
----------------------------
Check whether the domain part of the given email address (the part behind the "@") is known as a disposable and temporary email address domain. These are often used to register dummy users in order to spam or abuse some services.
A list of such domains is maintained at https://github.com/martenson/disposable-email-domains, and this module uses that list.
:code:`validate_email_or_fail()` raises :code:`DomainBlacklistedError` if the email address belongs to a blacklisted domain.
:code:`check_dns`
----------------------------
Check whether there is a valid list of servers responsible for delivering emails to the given email address.
First, a DNS query is issued for the email address' domain to retrieve a list of all MX records. That list is then stripped of duplicates and malformatted entries. If at the end of this procedure, at least one valid MX record remains, the check is considered successful.
On failure of this check, :code:`validate_email_or_fail()` raises one of the following exceptions, all of which descend from :code:`DNSError`:
:code:`DomainNotFoundError`
The domain of the email address cannot be found at all.
:code:`NoNameserverError`
There is no nameserver for the domain.
:code:`DNSTimeoutError`
A timeout occured when querying the nameserver. Note that the timeout period can be changed with the :code:`dns_timeout` parameter.
:code:`DNSConfigurationError`
The nameserver is misconfigured.
:code:`NoMXError`
The nameserver does not list any MX records for the domain.
:code:`NoValidMXError`
The nameserver lists MX records for the domain, but none of them is valid.
:code:`check_smtp`
----------------------------
Check whether the given email address exists by simulating an actual email delivery.
A connection to the SMTP server identified through the domain's MX record is established, and an SMTP conversation is initiated up to the point where the server confirms the existence of the email address. After that, instead of actually sending an email, the conversation is cancelled.
The module will try to negotiate a TLS connection with STARTTLS, and silently fall back to an unencrypted SMTP connection if the server doesn't support it.
If the SMTP server replies to the :code:`RCPT TO` command with a code 250 (success) response, the check is considered successful.
If the SMTP server replies with a code 5xx (permanent error) response at any point in the conversation, the check is considered failed.
If the SMTP server cannot be connected, unexpectedly closes the connection, or replies with a code 4xx (temporary error) at any stage of the conversation, the check is considered ambiguous.
If there is more than one valid MX record for the domain, they are tried in order of priority until the first time the check is either successful or failed. Only in case of an ambiguous check result, the next server is tried, and only if the check result is ambiguous for all servers, the overall check is considered ambigous as well.
On failure of this check or on ambiguous result, :code:`validate_email_or_fail()` raises one of the following exceptions, all of which descend from :code:`SMTPError`:
:code:`AddressNotDeliverableError`
The SMTP server permanently refused the email address. Technically, this means that the server replied to the :code:`RCPT TO` command with a code 5xx response.
:code:`SMTPCommunicationError`
The SMTP server refused to even let us get to the point where we could ask it about the email address. Technically, this means that the server sent a code 5xx response either immediately after connection, or as a reply to the :code:`EHLO` (or :code:`HELO`) or :code:`MAIL FROM` commands.
:code:`SMTPTemporaryError`
A temporary error occured during the check for all available MX servers. This is considered an ambigous check result. For example, greylisting is a frequent cause for this.
All of the above three exceptions provide further detail about the error response(s) in the exception's instance variable :code:`error_messages`.
Auto-updater
============================
The package contains an auto-updater for downloading and updating the built-in blacklist.txt. It will run on each module load (and installation), but will try to update the content only if the file is older than 5 days, and if the content is not the same that's already downloaded.
The update can be triggered manually::
@ -68,4 +171,5 @@ The update can be triggered manually::
Read the FAQ_!
============================
.. _FAQ: https://github.com/karolyi/py3-validate-email/blob/master/FAQ.md

View File

@ -20,20 +20,20 @@ class BlacklistCheckTestCase(TestCase):
domainlist_check(EmailAddress('pm2@mailinator.com'))
with self.assertRaises(DomainBlacklistedError):
validate_email_or_fail(
email_address='pm2@mailinator.com', check_regex=False,
use_blacklist=True)
email_address='pm2@mailinator.com', check_format=False,
check_blacklist=True)
with self.assertRaises(DomainBlacklistedError):
validate_email_or_fail(
email_address='pm2@mailinator.com', check_regex=True,
use_blacklist=True)
email_address='pm2@mailinator.com', check_format=True,
check_blacklist=True)
with self.assertLogs():
self.assertFalse(expr=validate_email(
email_address='pm2@mailinator.com', check_regex=False,
use_blacklist=True, debug=True))
email_address='pm2@mailinator.com', check_format=False,
check_blacklist=True))
with self.assertLogs():
self.assertFalse(expr=validate_email(
email_address='pm2@mailinator.com', check_regex=True,
use_blacklist=True, debug=True))
email_address='pm2@mailinator.com', check_format=True,
check_blacklist=True))
def test_blacklist_negative(self):
'Allows a domain not in the blacklist.'

View File

@ -49,7 +49,7 @@ def _get_cleaned_mx_records(domain: str, timeout: int) -> list:
return result
def dns_check(email_address: EmailAddress, dns_timeout: int = 10) -> list:
def dns_check(email_address: EmailAddress, timeout: int = 10) -> list:
"""
Check whether there are any responsible SMTP servers for the email
address by looking up the DNS MX records.
@ -62,4 +62,4 @@ def dns_check(email_address: EmailAddress, dns_timeout: int = 10) -> list:
return [email_address.domain_literal_ip]
else:
return _get_cleaned_mx_records(
domain=email_address.domain, timeout=dns_timeout)
domain=email_address.domain, timeout=timeout)

View File

@ -56,11 +56,11 @@ class DomainListValidator(object):
self.domain_blacklist = set(
x.strip().lower() for x in lines if x.strip())
def __call__(self, address: EmailAddress) -> bool:
def __call__(self, email_address: EmailAddress) -> bool:
'Do the checking here.'
if address.domain in self.domain_whitelist:
if email_address.domain in self.domain_whitelist:
return True
if address.domain in self.domain_blacklist:
if email_address.domain in self.domain_blacklist:
raise DomainBlacklistedError
return True

View File

@ -44,41 +44,41 @@ class DomainBlacklistedError(EmailValidationError):
message = 'Domain blacklisted.'
class MXError(EmailValidationError):
class DNSError(EmailValidationError):
"""
Base class of all exceptions that indicate failure to determine a
valid MX for the domain of email address.
"""
class DomainNotFoundError(MXError):
class DomainNotFoundError(DNSError):
'Raised when the domain is not found.'
message = 'Domain not found.'
class NoNameserverError(MXError):
class NoNameserverError(DNSError):
'Raised when the domain does not resolve by nameservers in time.'
message = 'No nameserver found for domain.'
class DNSTimeoutError(MXError):
class DNSTimeoutError(DNSError):
'Raised when the domain lookup times out.'
message = 'Domain lookup timed out.'
class DNSConfigurationError(MXError):
class DNSConfigurationError(DNSError):
"""
Raised when the DNS entries for this domain are falsely configured.
"""
message = 'Misconfigurated DNS entries for domain.'
class NoMXError(MXError):
class NoMXError(DNSError):
'Raised when the domain has no MX records configured.'
message = 'No MX record for domain found.'
class NoValidMXError(MXError):
class NoValidMXError(DNSError):
"""
Raised when the domain has MX records configured, but none of them
has a valid format.

View File

@ -28,22 +28,22 @@ def _validate_ipv46_address(value: str) -> bool:
return _validate_ipv4_address(value) or _validate_ipv6_address(value)
def regex_check(address: EmailAddress) -> bool:
def regex_check(email_address: EmailAddress) -> bool:
'Slightly adjusted email regex checker from the Django project.'
# Validate user part.
if not USER_REGEX.match(address.user):
if not USER_REGEX.match(email_address.user):
raise AddressFormatError
# Validate domain part.
if address.domain_literal_ip:
literal_match = LITERAL_REGEX.match(address.ace_domain)
if email_address.domain_literal_ip:
literal_match = LITERAL_REGEX.match(email_address.ace_domain)
if literal_match is None:
raise AddressFormatError
if not _validate_ipv46_address(literal_match[1]):
raise AddressFormatError
else:
if HOST_REGEX.match(address.ace_domain) is None:
if HOST_REGEX.match(email_address.ace_domain) is None:
raise AddressFormatError
# All validations successful.

View File

@ -175,9 +175,10 @@ class _SMTPChecker(SMTP):
def smtp_check(
email_address: EmailAddress, mx_records: list, debug: bool,
from_address: Optional[EmailAddress] = None,
helo_host: Optional[str] = None, smtp_timeout: int = 10) -> bool:
email_address: EmailAddress, mx_records: List[str],
timeout: float = 10, helo_host: Optional[str] = None,
from_address: Optional[EmailAddress] = None, debug: bool = False
) -> bool:
"""
Returns `True` as soon as the any of the given server accepts the
recipient address.
@ -196,6 +197,6 @@ def smtp_check(
determined either.
"""
smtp_checker = _SMTPChecker(
local_hostname=helo_host, timeout=smtp_timeout, debug=debug,
local_hostname=helo_host, timeout=timeout, debug=debug,
sender=from_address or email_address, recip=email_address)
return smtp_checker.check(hosts=mx_records)

View File

@ -12,6 +12,7 @@ from .smtp_check import smtp_check
LOGGER = getLogger(name=__name__)
__all__ = ['validate_email', 'validate_email_or_fail']
__doc__ = """\
Verify the given email address by determining the SMTP servers
responsible for the domain and then asking them to deliver an email to
@ -26,39 +27,39 @@ simply accept everything and send a bounce notification later. Hence, a
def validate_email_or_fail(
email_address: str, check_regex: bool = True, check_mx: bool = True,
from_address: Optional[str] = None, helo_host: Optional[str] = None,
smtp_timeout: int = 10, dns_timeout: int = 10,
use_blacklist: bool = True, debug: bool = False,
skip_smtp: bool = False) -> Optional[bool]:
email_address: str, *, check_format: bool = True,
check_blacklist: bool = True, check_dns: bool = True,
dns_timeout: float = 10, check_smtp: bool = True,
smtp_timeout: float = 10, smtp_helo_host: Optional[str] = None,
smtp_from_address: Optional[str] = None, smtp_debug: bool = False
) -> Optional[bool]:
"""
Return `True` if the email address validation is successful, `None`
if the validation result is ambigious, and raise an exception if the
validation fails.
"""
email_address = EmailAddress(address=email_address)
if from_address is not None:
if check_format:
regex_check(email_address=email_address)
if check_blacklist:
domainlist_check(email_address=email_address)
if not (check_dns or check_smtp): # check_smtp implies check_dns.
return True
mx_records = dns_check(email_address=email_address, timeout=dns_timeout)
if not check_smtp:
return True
if smtp_from_address is not None:
try:
from_address = EmailAddress(address=from_address)
smtp_from_address = EmailAddress(address=smtp_from_address)
except AddressFormatError:
raise FromAddressFormatError
if check_regex:
regex_check(address=email_address)
if use_blacklist:
domainlist_check(address=email_address)
if not check_mx:
return True
mx_records = dns_check(
email_address=email_address, dns_timeout=dns_timeout)
if skip_smtp:
return True
return smtp_check(
email_address=email_address, mx_records=mx_records,
from_address=from_address, helo_host=helo_host,
smtp_timeout=smtp_timeout, debug=debug)
timeout=smtp_timeout, helo_host=smtp_helo_host,
from_address=smtp_from_address, debug=smtp_debug)
def validate_email(email_address: str, *args, **kwargs):
def validate_email(email_address: str, **kwargs):
"""
Return `True` or `False` depending if the email address exists
or/and can be delivered.
@ -66,7 +67,7 @@ def validate_email(email_address: str, *args, **kwargs):
Return `None` if the result is ambigious.
"""
try:
return validate_email_or_fail(email_address, *args, **kwargs)
return validate_email_or_fail(email_address, **kwargs)
except SMTPTemporaryError as error:
LOGGER.info(msg=f'Validation for {email_address!r} ambigious: {error}')
return