Clean up parameter list, update docs

See also discussion at
https://github.com/karolyi/py3-validate-email/discussions/57
This commit is contained in:
Reinhard Müller 2021-03-12 01:06:13 +01:00
parent 1b9b0682cd
commit da540d8db2
10 changed files with 182 additions and 67 deletions

View File

@ -1,12 +1,21 @@
1.0.0: 1.0.0:
- New major release with breaking changes! They are: - New major release with breaking changes! They are:
- Parameter names for validate_email() and validate_email_or_fail() have changed:
- check_regex -> check_format
- use_blacklist -> check_blacklist
- check_mx -> check_dns
- skip_smtp -> check_smtp (with inverted logic)
- helo_host -> smtp_helo_host
- from_address -> smtp_from_address
- debug -> smtp_debug
- All parameters except for the first one (the email address to check) are now keyword-only.
- Ambiguous results and the possibility of more of them, to reflect a real world SMTP delivery process: - Ambiguous results and the possibility of more of them, to reflect a real world SMTP delivery process:
- The module will keep trying probing through all MX hosts for validation and emit errors in the end of the full probing procedure. - The module tries all MX hosts in order of priority.
- Any acceptance of the email delivery will be marked as valid, despite any other ambigious or negative result(s). - An acceptance of the email address will yield a positive verification result, no further MX hosts will be tried.
- The validate_email_or_fail() function will raise an SMTPCommunicationError() on a denied email address only in the end. - Any permanent SMTP error (5xx) will yield a negative verification result, no further MX hosts will be tried.
- The validate_email_or_fail() function will now raise an SMTPTemporaryError() on an ambiguous result. That is, greylisting or no servers providing a definitive negative or positive. - Any temporary SMTP error (4xx) or any connection issue will cause the next MX host to be tried. Only if all MX hosts yield these kinds of errors, the overall verification result will be ambiguous. That is, greylisting or no servers providing a definitive negative or positive.
- A server that bails out with a 4xx code at any part of the SMTP conversation, will be marked as ambiguous. - The validate_email_or_fail() function will now raise an SMTPTemporaryError() on an ambiguous result.
- Both of the aforementioned exceptions will contain the occurred communication results in their error_messages class variables. - All exceptions raised by the SMTP check will contain the occurred communication results in their error_messages class variables.
- Internal API changes (refactorings) - Internal API changes (refactorings)
- Check results are now logged with info level, instead of emitting warnings when debug is turned on. - Check results are now logged with info level, instead of emitting warnings when debug is turned on.
- Props to @reinhard-mueller for coming up with the new proposals and helping in refining the idea. - Props to @reinhard-mueller for coming up with the new proposals and helping in refining the idea.

2
FAQ.md
View File

@ -36,7 +36,7 @@ Run this code with the module installed (use your parameters within),
and see the output: and see the output:
```python ```python
python -c 'import logging, sys; logging.basicConfig(stream=sys.stderr, level=logging.DEBUG); from validate_email import validate_email; print(validate_email(\'your.email@address.com\', check_mx=True, debug=True))' python -c 'import logging, sys; logging.basicConfig(stream=sys.stderr, level=logging.DEBUG); from validate_email import validate_email; print(validate_email(\'your.email@address.com\', smtp_debug=True))'
``` ```
If you still don't understand why your code doesn't work as expected by If you still don't understand why your code doesn't work as expected by

View File

@ -25,32 +25,135 @@ USAGE
Basic usage:: Basic usage::
from validate_email import validate_email from validate_email import validate_email
is_valid = validate_email(email_address='example@example.com', check_regex=True, check_mx=True, from_address='my@from.addr.ess', helo_host='my.host.name', smtp_timeout=10, dns_timeout=10, use_blacklist=True, debug=False) is_valid = validate_email(email_address='example@example.com', check_format=True, check_blacklist=True, check_dns=True, dns_timeout=10, check_smtp=True, smtp_timeout=10, smtp_helo_host='my.host.name', smtp_from_address='my@from.addr.ess', smtp_debug=False)
:code:`check_regex` will check will the email address has a valid structure and defaults to True Parameters
----------------------------
:code:`check_mx`: check the mx-records and check whether the email actually exists :code:`email_address`: the email address to check
:code:`from_address`: the email address the probe will be sent from :code:`check_format`: check whether the email address has a valid structure; defaults to :code:`True`
:code:`helo_host`: the host to use in SMTP HELO when checking for an email :code:`check_blacklist`: check the email against the blacklist of domains downloaded from https://github.com/martenson/disposable-email-domains; defaults to :code:`True`
:code:`smtp_timeout`: seconds until SMTP timeout :code:`check_dns`: check the DNS mx-records, defaults to :code:`True`
:code:`dns_timeout`: seconds until DNS timeout; defaults to 10 seconds
:code:`dns_timeout`: seconds until DNS timeout :code:`check_smtp`: check whether the email actually exists by initiating an SMTP conversation; defaults to :code:`True`
:code:`use_blacklist`: use the blacklist of domains downloaded from https://github.com/martenson/disposable-email-domains :code:`smtp_timeout`: seconds until SMTP timeout; defaults to 10 seconds
:code:`debug`: emit debug/warning messages while checking email :code:`smtp_helo_host`: the hostname to use in SMTP HELO/EHLO; if set to :code:`None` (the default), the fully qualified domain name of the local host is used
:code:`skip_smtp`: (default :code:`False`) skip the SMTP conversation with the server, after MX checks. Will automatically be set to :code:`True` when :code:`check_mx` is :code:`False`! :code:`smtp_from_address`: the email address used for the sender in the SMTP conversation; if set to :code:`None` (the default), the :code:`email_address` parameter is used as the sender as well
:code:`smtp_debug`: activate :code:`smtplib`'s debug output which always goes to stderr; defaults to :code:`False`
Result
----------------------------
The function :code:`validate_email()` returns the following results:
:code:`True`
All requested checks were successful for the given email address.
:code:`False`
At least one of the requested checks failed for the given email address.
:code:`None`
None of the requested checks failed, but at least one of them yielded an ambiguous result. Currently, the SMTP check is the only check which can actually yield an ambigous result.
Getting more information
----------------------------
The function :code:`validate_email_or_fail()` works exactly like :code:`validate_email`, except that it raises an exception in the case of validation failure and ambiguous result instead of returning :code:`False` or :code:`None`, respectively. The function :code:`validate_email_or_fail()` works exactly like :code:`validate_email`, except that it raises an exception in the case of validation failure and ambiguous result instead of returning :code:`False` or :code:`None`, respectively.
All these exceptions descend from :code:`EmailValidationError`. Please see below for the exact exceptions raised by the various checks. Note that all exception classes are defined in the module :code:`validate_email.exceptions`.
Please note that :code:`SMTPTemporaryError` indicates an ambigous check result rather than a check failure, so if you use :code:`validate_email_or_fail()`, you probably want to catch this exception.
The checks
============================
By default, all checks are enabled, but each of them can be disabled by one of the :code:`check_...` parameters. Note that, however, :code:`check_smtp` implies :code:`check_dns`.
:code:`check_format`
----------------------------
Check whether the given email address conforms to the general format requirements of valid email addresses.
:code:`validate_email_or_fail()` raises :code:`AddressFormatError` on any failure of this test.
:code:`check_blacklist`
----------------------------
Check whether the domain part of the given email address (the part behind the "@") is known as a disposable and temporary email address domain. These are often used to register dummy users in order to spam or abuse some services.
A list of such domains is maintained at https://github.com/martenson/disposable-email-domains, and this module uses that list.
:code:`validate_email_or_fail()` raises :code:`DomainBlacklistedError` if the email address belongs to a blacklisted domain.
:code:`check_dns`
----------------------------
Check whether there is a valid list of servers responsible for delivering emails to the given email address.
First, a DNS query is issued for the email address' domain to retrieve a list of all MX records. That list is then stripped of duplicates and malformatted entries. If at the end of this procedure, at least one valid MX record remains, the check is considered successful.
On failure of this check, :code:`validate_email_or_fail()` raises one of the following exceptions, all of which descend from :code:`DNSError`:
:code:`DomainNotFoundError`
The domain of the email address cannot be found at all.
:code:`NoNameserverError`
There is no nameserver for the domain.
:code:`DNSTimeoutError`
A timeout occured when querying the nameserver. Note that the timeout period can be changed with the :code:`dns_timeout` parameter.
:code:`DNSConfigurationError`
The nameserver is misconfigured.
:code:`NoMXError`
The nameserver does not list any MX records for the domain.
:code:`NoValidMXError`
The nameserver lists MX records for the domain, but none of them is valid.
:code:`check_smtp`
----------------------------
Check whether the given email address exists by simulating an actual email delivery.
A connection to the SMTP server identified through the domain's MX record is established, and an SMTP conversation is initiated up to the point where the server confirms the existence of the email address. After that, instead of actually sending an email, the conversation is cancelled.
The module will try to negotiate a TLS connection with STARTTLS, and silently fall back to an unencrypted SMTP connection if the server doesn't support it. The module will try to negotiate a TLS connection with STARTTLS, and silently fall back to an unencrypted SMTP connection if the server doesn't support it.
If the SMTP server replies to the :code:`RCPT TO` command with a code 250 (success) response, the check is considered successful.
If the SMTP server replies with a code 5xx (permanent error) response at any point in the conversation, the check is considered failed.
If the SMTP server cannot be connected, unexpectedly closes the connection, or replies with a code 4xx (temporary error) at any stage of the conversation, the check is considered ambiguous.
If there is more than one valid MX record for the domain, they are tried in order of priority until the first time the check is either successful or failed. Only in case of an ambiguous check result, the next server is tried, and only if the check result is ambiguous for all servers, the overall check is considered ambigous as well.
On failure of this check or on ambiguous result, :code:`validate_email_or_fail()` raises one of the following exceptions, all of which descend from :code:`SMTPError`:
:code:`AddressNotDeliverableError`
The SMTP server permanently refused the email address. Technically, this means that the server replied to the :code:`RCPT TO` command with a code 5xx response.
:code:`SMTPCommunicationError`
The SMTP server refused to even let us get to the point where we could ask it about the email address. Technically, this means that the server sent a code 5xx response either immediately after connection, or as a reply to the :code:`EHLO` (or :code:`HELO`) or :code:`MAIL FROM` commands.
:code:`SMTPTemporaryError`
A temporary error occured during the check for all available MX servers. This is considered an ambigous check result. For example, greylisting is a frequent cause for this.
All of the above three exceptions provide further detail about the error response(s) in the exception's instance variable :code:`error_messages`.
Auto-updater Auto-updater
============================ ============================
The package contains an auto-updater for downloading and updating the built-in blacklist.txt. It will run on each module load (and installation), but will try to update the content only if the file is older than 5 days, and if the content is not the same that's already downloaded. The package contains an auto-updater for downloading and updating the built-in blacklist.txt. It will run on each module load (and installation), but will try to update the content only if the file is older than 5 days, and if the content is not the same that's already downloaded.
The update can be triggered manually:: The update can be triggered manually::
@ -68,4 +171,5 @@ The update can be triggered manually::
Read the FAQ_! Read the FAQ_!
============================ ============================
.. _FAQ: https://github.com/karolyi/py3-validate-email/blob/master/FAQ.md .. _FAQ: https://github.com/karolyi/py3-validate-email/blob/master/FAQ.md

View File

@ -20,20 +20,20 @@ class BlacklistCheckTestCase(TestCase):
domainlist_check(EmailAddress('pm2@mailinator.com')) domainlist_check(EmailAddress('pm2@mailinator.com'))
with self.assertRaises(DomainBlacklistedError): with self.assertRaises(DomainBlacklistedError):
validate_email_or_fail( validate_email_or_fail(
email_address='pm2@mailinator.com', check_regex=False, email_address='pm2@mailinator.com', check_format=False,
use_blacklist=True) check_blacklist=True)
with self.assertRaises(DomainBlacklistedError): with self.assertRaises(DomainBlacklistedError):
validate_email_or_fail( validate_email_or_fail(
email_address='pm2@mailinator.com', check_regex=True, email_address='pm2@mailinator.com', check_format=True,
use_blacklist=True) check_blacklist=True)
with self.assertLogs(): with self.assertLogs():
self.assertFalse(expr=validate_email( self.assertFalse(expr=validate_email(
email_address='pm2@mailinator.com', check_regex=False, email_address='pm2@mailinator.com', check_format=False,
use_blacklist=True, debug=True)) check_blacklist=True))
with self.assertLogs(): with self.assertLogs():
self.assertFalse(expr=validate_email( self.assertFalse(expr=validate_email(
email_address='pm2@mailinator.com', check_regex=True, email_address='pm2@mailinator.com', check_format=True,
use_blacklist=True, debug=True)) check_blacklist=True))
def test_blacklist_negative(self): def test_blacklist_negative(self):
'Allows a domain not in the blacklist.' 'Allows a domain not in the blacklist.'

View File

@ -49,7 +49,7 @@ def _get_cleaned_mx_records(domain: str, timeout: int) -> list:
return result return result
def dns_check(email_address: EmailAddress, dns_timeout: int = 10) -> list: def dns_check(email_address: EmailAddress, timeout: int = 10) -> list:
""" """
Check whether there are any responsible SMTP servers for the email Check whether there are any responsible SMTP servers for the email
address by looking up the DNS MX records. address by looking up the DNS MX records.
@ -62,4 +62,4 @@ def dns_check(email_address: EmailAddress, dns_timeout: int = 10) -> list:
return [email_address.domain_literal_ip] return [email_address.domain_literal_ip]
else: else:
return _get_cleaned_mx_records( return _get_cleaned_mx_records(
domain=email_address.domain, timeout=dns_timeout) domain=email_address.domain, timeout=timeout)

View File

@ -56,11 +56,11 @@ class DomainListValidator(object):
self.domain_blacklist = set( self.domain_blacklist = set(
x.strip().lower() for x in lines if x.strip()) x.strip().lower() for x in lines if x.strip())
def __call__(self, address: EmailAddress) -> bool: def __call__(self, email_address: EmailAddress) -> bool:
'Do the checking here.' 'Do the checking here.'
if address.domain in self.domain_whitelist: if email_address.domain in self.domain_whitelist:
return True return True
if address.domain in self.domain_blacklist: if email_address.domain in self.domain_blacklist:
raise DomainBlacklistedError raise DomainBlacklistedError
return True return True

View File

@ -44,41 +44,41 @@ class DomainBlacklistedError(EmailValidationError):
message = 'Domain blacklisted.' message = 'Domain blacklisted.'
class MXError(EmailValidationError): class DNSError(EmailValidationError):
""" """
Base class of all exceptions that indicate failure to determine a Base class of all exceptions that indicate failure to determine a
valid MX for the domain of email address. valid MX for the domain of email address.
""" """
class DomainNotFoundError(MXError): class DomainNotFoundError(DNSError):
'Raised when the domain is not found.' 'Raised when the domain is not found.'
message = 'Domain not found.' message = 'Domain not found.'
class NoNameserverError(MXError): class NoNameserverError(DNSError):
'Raised when the domain does not resolve by nameservers in time.' 'Raised when the domain does not resolve by nameservers in time.'
message = 'No nameserver found for domain.' message = 'No nameserver found for domain.'
class DNSTimeoutError(MXError): class DNSTimeoutError(DNSError):
'Raised when the domain lookup times out.' 'Raised when the domain lookup times out.'
message = 'Domain lookup timed out.' message = 'Domain lookup timed out.'
class DNSConfigurationError(MXError): class DNSConfigurationError(DNSError):
""" """
Raised when the DNS entries for this domain are falsely configured. Raised when the DNS entries for this domain are falsely configured.
""" """
message = 'Misconfigurated DNS entries for domain.' message = 'Misconfigurated DNS entries for domain.'
class NoMXError(MXError): class NoMXError(DNSError):
'Raised when the domain has no MX records configured.' 'Raised when the domain has no MX records configured.'
message = 'No MX record for domain found.' message = 'No MX record for domain found.'
class NoValidMXError(MXError): class NoValidMXError(DNSError):
""" """
Raised when the domain has MX records configured, but none of them Raised when the domain has MX records configured, but none of them
has a valid format. has a valid format.

View File

@ -28,22 +28,22 @@ def _validate_ipv46_address(value: str) -> bool:
return _validate_ipv4_address(value) or _validate_ipv6_address(value) return _validate_ipv4_address(value) or _validate_ipv6_address(value)
def regex_check(address: EmailAddress) -> bool: def regex_check(email_address: EmailAddress) -> bool:
'Slightly adjusted email regex checker from the Django project.' 'Slightly adjusted email regex checker from the Django project.'
# Validate user part. # Validate user part.
if not USER_REGEX.match(address.user): if not USER_REGEX.match(email_address.user):
raise AddressFormatError raise AddressFormatError
# Validate domain part. # Validate domain part.
if address.domain_literal_ip: if email_address.domain_literal_ip:
literal_match = LITERAL_REGEX.match(address.ace_domain) literal_match = LITERAL_REGEX.match(email_address.ace_domain)
if literal_match is None: if literal_match is None:
raise AddressFormatError raise AddressFormatError
if not _validate_ipv46_address(literal_match[1]): if not _validate_ipv46_address(literal_match[1]):
raise AddressFormatError raise AddressFormatError
else: else:
if HOST_REGEX.match(address.ace_domain) is None: if HOST_REGEX.match(email_address.ace_domain) is None:
raise AddressFormatError raise AddressFormatError
# All validations successful. # All validations successful.

View File

@ -175,9 +175,10 @@ class _SMTPChecker(SMTP):
def smtp_check( def smtp_check(
email_address: EmailAddress, mx_records: list, debug: bool, email_address: EmailAddress, mx_records: List[str],
from_address: Optional[EmailAddress] = None, timeout: float = 10, helo_host: Optional[str] = None,
helo_host: Optional[str] = None, smtp_timeout: int = 10) -> bool: from_address: Optional[EmailAddress] = None, debug: bool = False
) -> bool:
""" """
Returns `True` as soon as the any of the given server accepts the Returns `True` as soon as the any of the given server accepts the
recipient address. recipient address.
@ -196,6 +197,6 @@ def smtp_check(
determined either. determined either.
""" """
smtp_checker = _SMTPChecker( smtp_checker = _SMTPChecker(
local_hostname=helo_host, timeout=smtp_timeout, debug=debug, local_hostname=helo_host, timeout=timeout, debug=debug,
sender=from_address or email_address, recip=email_address) sender=from_address or email_address, recip=email_address)
return smtp_checker.check(hosts=mx_records) return smtp_checker.check(hosts=mx_records)

View File

@ -12,6 +12,7 @@ from .smtp_check import smtp_check
LOGGER = getLogger(name=__name__) LOGGER = getLogger(name=__name__)
__all__ = ['validate_email', 'validate_email_or_fail']
__doc__ = """\ __doc__ = """\
Verify the given email address by determining the SMTP servers Verify the given email address by determining the SMTP servers
responsible for the domain and then asking them to deliver an email to responsible for the domain and then asking them to deliver an email to
@ -26,39 +27,39 @@ simply accept everything and send a bounce notification later. Hence, a
def validate_email_or_fail( def validate_email_or_fail(
email_address: str, check_regex: bool = True, check_mx: bool = True, email_address: str, *, check_format: bool = True,
from_address: Optional[str] = None, helo_host: Optional[str] = None, check_blacklist: bool = True, check_dns: bool = True,
smtp_timeout: int = 10, dns_timeout: int = 10, dns_timeout: float = 10, check_smtp: bool = True,
use_blacklist: bool = True, debug: bool = False, smtp_timeout: float = 10, smtp_helo_host: Optional[str] = None,
skip_smtp: bool = False) -> Optional[bool]: smtp_from_address: Optional[str] = None, smtp_debug: bool = False
) -> Optional[bool]:
""" """
Return `True` if the email address validation is successful, `None` Return `True` if the email address validation is successful, `None`
if the validation result is ambigious, and raise an exception if the if the validation result is ambigious, and raise an exception if the
validation fails. validation fails.
""" """
email_address = EmailAddress(address=email_address) email_address = EmailAddress(address=email_address)
if from_address is not None: if check_format:
regex_check(email_address=email_address)
if check_blacklist:
domainlist_check(email_address=email_address)
if not (check_dns or check_smtp): # check_smtp implies check_dns.
return True
mx_records = dns_check(email_address=email_address, timeout=dns_timeout)
if not check_smtp:
return True
if smtp_from_address is not None:
try: try:
from_address = EmailAddress(address=from_address) smtp_from_address = EmailAddress(address=smtp_from_address)
except AddressFormatError: except AddressFormatError:
raise FromAddressFormatError raise FromAddressFormatError
if check_regex:
regex_check(address=email_address)
if use_blacklist:
domainlist_check(address=email_address)
if not check_mx:
return True
mx_records = dns_check(
email_address=email_address, dns_timeout=dns_timeout)
if skip_smtp:
return True
return smtp_check( return smtp_check(
email_address=email_address, mx_records=mx_records, email_address=email_address, mx_records=mx_records,
from_address=from_address, helo_host=helo_host, timeout=smtp_timeout, helo_host=smtp_helo_host,
smtp_timeout=smtp_timeout, debug=debug) from_address=smtp_from_address, debug=smtp_debug)
def validate_email(email_address: str, *args, **kwargs): def validate_email(email_address: str, **kwargs):
""" """
Return `True` or `False` depending if the email address exists Return `True` or `False` depending if the email address exists
or/and can be delivered. or/and can be delivered.
@ -66,7 +67,7 @@ def validate_email(email_address: str, *args, **kwargs):
Return `None` if the result is ambigious. Return `None` if the result is ambigious.
""" """
try: try:
return validate_email_or_fail(email_address, *args, **kwargs) return validate_email_or_fail(email_address, **kwargs)
except SMTPTemporaryError as error: except SMTPTemporaryError as error:
LOGGER.info(msg=f'Validation for {email_address!r} ambigious: {error}') LOGGER.info(msg=f'Validation for {email_address!r} ambigious: {error}')
return return