Postfix Lost connection analysis: 421 4.4.2 host error: timeout exceeded

[Leer este post en español]

In this post we’ll show the analysis of another common error for smtp status code 421 4.4.2 host Error: timeout exceeded which refers to network or connection problems between SMTP servers. The symptoms you may observe is the your Postfix (IMSVA) server is not able to accept all incoming mails and sender server is starting to have queuing problems.

The following lines show the log from a Sendmail server that is not able to send mail to Postfix (IMSVA).

user@company.com... Connecting to postfix.localdomain via relay...
220 ESMTP
>>> EHLO sendmail.localdomain
250-postfix.localdomain
250-PIPELINING
250-SIZE 31457280
250-ETRN
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
>>> MAIL From:<user@company.com> SIZE=4178
250 2.1.0 Ok
>>> RCPT To:<external@company2.com>
>>> DATA
250 2.1.5 Ok
354 End data with <CR><LF>.<CR><LF>
company3.com: Name server timeout
timeout writing message to postfix.com: Broken pipe
user@company.com... Deferred
Closing connection to postfix.localdomain.

The best way to understand this kind of errors is by making a network capture while replicating the problem, this way we can see all transferred packets in real time in order to analyze both server’s behavior. If it’s possible, you should make this capture on both ends to ensure that any data sent is being received as it was meant to, however, if you can only capture on your end, its ok too, because you’ll be able to at least, determine if the problem is on your end or not.

The tcpdump command to do the capture is:

tcpdump port 25 –s0 –w capture.cap

The following Wireshark image shows the capture from a Postfix Server that rejects mail by using the 421 smtp status code.

When you analyze any capture with Wireshark, notice that the first column (No.) shows the number of the packet being presented so we’ll describe all the process using those numbers as reference.

90: Sendmail sends simultaneously both commands: RCPT and DATA.

91: Postfix accepts the RCPT command. We are certain of this because the response is 250 2.1.5 OK.

93: Postfix accepts the DATA command. We are certain of this because the response is 354. From this point on, Postfix will wait until Sendmail sends the message body. Notice that this packet took place at 09:56:02 (second Wireshark column “Time”).

94: Sendmail sends a TCP packet with the ACK flag set indicating it has acknowledge the last TCP packet Postfix just sent. This means that both, networking and the TCP session itself are not presenting any problem so far.

865: About one minute later (09:57:01) Postfix timed out because it hasn’t received any information from Sendmail, so it cuts the communication with a status code 421 4.4.2 assuming there may be a network or connectivity problem that doesn’t allow Sendmail to transfer the message. Also notice that Postfix is configured to wait only for 60 seconds between its last response and the next block of data, which is precisely what we are observing between these two last packets.

866: Because Postfix has already closed the SMTP session because it received no answer, it sends a TCP packet with the FIN flag set to indicate Sendmail the TCP session should end right away.

867: Sendmail accepts the FIN instruction and responds with the corresponding aknowledge which means there’s no network problem between them.

1207: Half a minute later, Sendmail begins the message transfer, but at this point Postfix no longer has any SMTP opened session to receive information.

1208: Because Postfix had already closed the connection to Sendmail, it now sends a RST packet to indicate Sendmail there is no TCP session open on its side to receive data. Remember that a RST packet is used to indicate one of the parts that it is no longer in synchrony with the TCP session and should start a new one if it wants to transmit.

 

This information is more than enough to conclude that even when there is no network problem between these two servers, Sendmail has exceeded the timeout limit Postfix is configured to wait for before assuming something is wrong with the connection. At this point we can be sure that Postfix is not the problem but Sendmail which is taking at least one and a half minute to start the message transmission.

If we go back to Sendmail logs, you can see that “Broken pipe” error is being logged here because it is trying resolve an IP of another mail server that is not even the one in the RCPT command. This means there must be an active policy on Sendmail that is delaying the message transmission.

Company3.com: Name server timeout
timeout writing message to postfix.com: Broken pipe
user@company.com... Deferred
Closing connection to postfix.localdomain.

When you see this situation, Postfix will log the following line in /var/log/maillog:

timeout after DATA (0 bytes) from unknown[xx.xx.xx.xx]

The best way to solve this is of course identify the filtering / resolution policies on the Sendmail server and correct them, however if this is not possible or if the other domain is not willing to cooperate for any reason, we can always count with Postfix configuration options to mitigate the problem.

From the information we already have, we know Sendmail is waiting at least 90 seconds to start the message transmission so we can configure a higher value for the smtpd daemon time out so it will give Sendmail more time before closing the connection. Please notice that these changes are only for mitigation and they are not proposed because Postfix is the one with the problem. Also notice that the proposed changes will not affect the availability of the mail service since Postfix will not restart its main engine.

# postconf -e smtpd_timeout=120s
# postfix reload

Where 120s is our proposed value based on the information presented here, so please adjust this value to any delay you may have observed on your own capture.

 

For more information about the rules to create, transmit and process an email you can check out our publication on The SMTP Protocol Fundamentals.

We hope you’ve find this post usefull and remember you can send us your questions and comments to our Twitter account: @redinskala where you’ll find more information and security tips.

Thanks for your visit!