Maillog Analysis for Postfix and IMSVA (Part II)

PART II. Maillog indicators interpretation

In Part I. Postfix Maillog Performance Indicators, we reviewed how to find several values by filtering maillog that would help us understand how our Postfix or IMSVA device is actually working in terms of queuing, reception and delivery performance. Now we’ll see how to put a context to those numbers to transform them from just values to actions / configurations that will improve our device’s longtime performance. 

There is no unique way to interpret these numbers mostly because each server and infrastructure is different. Based on this, what we will show here is a methodology more than an exact guide to follow.

This methodology of Interpretation is based on the following highlights:

  • Evaluate the rate between received and delivered mail.
  • Evaluate the rate between delivered and rejected mail for a permanent error and if it has a significant impact.
  • Identify if the reasons for the permanent rejection may be mitigated by Postfix.
  • Evaluate the rate between delivered and temporarily rejected mail and if it has a significant impact.
  • Identify if the reasons for the temporal rejection may be mitigated by Postfix.
  • Evaluate the rate between delivered and deleted/quarantined mail by filtering policies.
  • Identify if there is any significant delay in mail delivery.
  • Summary

Now let’s see in detail each of these points to see how they may be interpreted :

 

1.       Evaluate the rate between received and delivered mail.

To obtain this rate we’ll use the previously obtained indicators from Part I of this post with the following formula:

 

DELIVERY EFFICIENCY = MAIL DELIVERED TO INTERNET / NUMBER OF INBOUND CONNECTIONS

 

Ideally one would think that the best result should be “1”, because all mails should be delivered. If your Postfix/IMSVA server doesn’t apply any filtering rules like content, SPAM, etc., then one would be right. However, if you have policies in place, which is the case for IMSVA, then this wouldn’t be necessarily true. This is because if you were to obtain a value of “1” then this would mean there’s something wrong in your configuration that doesn’t allow your server to enforce the rules and all mail is just passing by.

Let’s see the following example:

MAIL DELIVERED TO INTERNET = 100 mails

INBOUND CONNECTIONS = 100 connections

DELIVERY EFFICIENCY = 1

 

NOTE: One connection may be used to send more than one mail, so here we are assuming that at least one inbound connection delivers one single mail to make calculations easier. 

This example is very clear, everything that comes in gets delivered, which indicates that no filtering policies are being applied. In case there were any active policies, this indicates a potential problem where something is not allowing Postfix to filter these mails correctly.

Let’s see another example:

 

MAIL DELIVERED TO INTERNET = 80 mails

INBOUND CONNECTIONS = 100 connections

DELIVERY EFFICIENCY = 0.8 (80%)

 

An efficiency of 0.8 (80%) indicates a more real example. Here we can see that for every 100 received mails, 20 are not getting out of Postfix, however this number is still not enough  to indicate if all 20 mails are being queued or if they are being filtered by any policy.

The next step here is to investigate to which of the following 3 categories these mails belong to: permanent rejection, temporal rejection or policy enforcement. We’ll talk about how to determine this in the following sections.

When we obtain the value of this rate, no matter of which numerical value it has, the immediate question is ¿how can I be sure if this value is adequate to my Organization?

We’ll return to this question at the end, when we have all the remaining rates values, however, any mail system should be able to deliver a rate of 80% from a 100% that corresponds to mail that must be delivered. A value under 80% has a higher probability to produce delays that may impact in both business continuity / branding.

 

 

2.       Evaluate the rate between delivered and rejected mail for a permanent error and if it has a significant impact.

This rate indicates how many connections / storage / CPU Postfix is using in mails that WILL NOT BE DELIVERED. You can obtain this rate with the following formula:

RESOURCES FOR PERMANENT REJECTED MAIL = PERMANENT REJECTED MAIL / NUMBER OF OUTBOUND CONNECTIONS

Ideally this value should be “0”, which would indicate all outbound connections are not wasted on mails that can’t be delivered, however this not always the case.

Let’s retake the first example, the indicators are as follows:

MAILS DELIVERED TO INTERNET = 100 mails

INBOUND CONNECTIONS = 100 connections

PERMANENT REJECTED MAILS = 0 mails

OUTBOUND CONNECTIONS = 100 connections

RESOURCES FOR PERMANENT REJECTED MAIL = 0%

DELIVERY EFFICIENCY = 100%

In this ideal scenario, we see that all mail received by Postfix is delivered to its destination. Because there are 0 mails with permanent rejection, we can conclude all outbound connections are being used efficiently because each outbound connection ends up with a successful delivery.

Let’s see now the following example:

INBOUND CONNECTIONS = 100 connections

OUTBOUND CONNECTIONS = 90 connections

MAIL DELIVERED TO INTERNET = 80 mails

MAILS WITH PERMANENT REJECTION = 100 mails

RESOURCES FOR PERMANENT REJECTED MAIL = 11%

DELIVERY EFFICIENCY = 80%

In this, more realistic case, we observe that for every 100 received mails we are delivering only 80. From the remaining 20, 10 mails will never be delivered which represents the 11% of the total connections used. Even when this rate may seem small, notice that this 10 mails correspond to the 50% of the 20 mails that were not delivered. So if you were to do something to mitigate this number, the DELIVERY EFFICIENCY would automatically raise up to 90% which is a very significant improvement.

In the following section we’ll develop the methods used to determine if this value can or can’t be mitigated by changing something within Postfix.

 

 

2.1. Identify if the reasons for the permanent rejection may be mitigated by Postfix.

At this point, is where the numbers and interpretations begin to complicate a little more. To better explain this, we’ll use real indicators obtained using the filters discussed on Part I.

NUMBER OF INBOUND CONNECTIONS = 800,000

NUMBER OF OUTBOUND CONNECTIONS =1,000,000

NUMBER OF MAILS DELIVERED TO INTERNET = 180,000

MAILS WITH PERMANENT REJECTIONS = 48,000

Mails rejected because of non-existent mailbox = 42,000

Mails rejected because of non-existent domain = 6,000

RESOURCES FOR PERMANENT REJECTED MAIL = 4.8%

DELIVERY EFFICIENCY = 22.5%

Now let’s see some highlights to interpret this result:

a)  First of all, we can see that the DELIVERY EFFICIENCY rate is extremely low, only the 22.5% of all mail if being delivered. However, it is still a little soon to conclude that this value is significant until we know the values of the remaining rates, but it is surely something to keep in mind to interpret the remaining results.

b) You can see there are more outbound connections than inbound, ¿how is this possible? ¿How come Postfix delivers more mail than it already has? Even when this may sound illogical at first, these numbers clearly show an excessive queuing problem. When you have more outbound than inbound connections, it means that by the time when the new 800,000 mails arrived, there were already a certain number of queued mails using outbound connections in order to be delivered, and remember that even when there were no queued mails, any mail with temporal rejection will use at least 2 connections, so the outbound connections will be in general higher.

c) The rate for RESOURCES FOR PERMANENT REJECTED MAIL is low, only 4.8% of the connections are being used to deliver mail that will not be accepted by the recipient system/user, so this doesn’t represent a significant impact in the delivery process of Postfix.

d) There are at least two reasons for these mails not being delivered and these are that either the domain or the mailbox do not exist. Both problems may be mitigated by avoiding them to enter Postfix.

How do you know that the domains do not exist? If you obtain the Top Ten list of these domains using DSN 5.4.4 you’ll see several descriptions, all of them indicating these mails cannot be delivered because there’s no MX nor A record in any public DNS that can tell us the IP to which these mails should be delivered. For instance, let’s check the following maillog lines filtered using the commands seen in Part I:

dsn=5.4.4, name not found. Name service error for name=webhostdb.webtelmex.net.mx type=A: Host not found)

dsn=5.4.4, name not found. Name service error for name=mx1.starmedia.com type=A: Host not found)

dsn=5.4.4, name not found. Name service error for name=mx.terra.es type=A: Host found but no data record of requested

dsn=5.4.4, name not found. Name service error for name=prodigy.mx type=A: Host not found)

dsn=5.4.4, name not found. Name service error for name=multi-net.com.mxtype=A: Host not found)

 

If you perform an A or MX DNS query you’ll see neither of both exist, so you can assume that these mails will not be delivered.

How do you know if the mailboxes do not exist? If you use the filters shown on Part I to show only DSN=5.0.0 you’ll end up with maillog lines like these:

 

dsn=5.0.0, 550 Requested action not taken: mailbox unavailable (in reply  to RCPT TO command))

dsn=5.0.0, 550 #5.1.0 Address rejected. (in reply to RCPT TO command))

dsn=5.0.0, 550 User unknown (in reply to RCPT TO command))

dsn=5.0.0, 550 Unrouteable address (in reply to RCPT TO command))

dsn=5.0.0, 550 No Such User Here (in reply to RCPT TO command))

dsn=5.0.0, 550 Relay not permitted / No such user (in reply to RCPT TO command))

dsn=5.0.0, 554 Sorry, no mailbox here by that name. (#5.1.1) (in reply to RCPT TO command))

dsn=5.0.0, 550 no mailbox by that name is currently available (in reply to RCPT TO command))

 

Besides these reasons, you may find others like the ones shown in the following DSN permanent rejection list:

dsn=5.0.0, MAILBOX DOES NOT EXIST

dsn=5.4.4, DOMAIN DOES NOT EXIST

dsn=5.1.1, MAILBOX OR ALIAS DO NOT EXIST

dsn=5.2.1, MAILBOX EXIST BUT IT IS DISABLED

dsn=5.7.1, MAILBOX IS NOT ABLE TO RECEIVE MAIL EITHER FOR A NON-AUTHORIZED RELAY REASON OR BECAUSE THE IP OR MAIL HAS VIOLATED A SECURITY POLICY

dsn=5.3.0, MAILBOX DOES NOT EXIST OR THE MAIL SERVER CANNOT ACCEPT MAIL TO THAT RECIPIENT BECAUSE A POLICY VIOLATION

dsn=5.7.0, MAIL OR IP HAS VIOLATED A SECURITY POLICY

dsn=5.4.1, MAIL SERVER IS NOT ACCEPTING THE CONNECTION BECAUSE A POSSIBLE RELAY VIOLATION

In all these cases, mail will not be accepted, however, you can see that some of them refer to server or policy errors which indicates not all mails will be necessarily rejected. Take DSN 5.7.0 for instance, you can’t be certain if only this particular sender has violated a security policy or if all senders of the domain are blocked. When you mitigate this kind of events it is convenient to take only those responses that will always result in a permanent rejection no matter who the sender is, the mail content or even your Postfix IP. Let’s suppose that hotmail.com has blocked your IP and sends hundreds of 5XX responses, you can’t just block all mail destined to hotmail.com, instead, the right answer is to find the reason why hotmail.com has blocked you, otherwise you’ll be blocking legit mail.

 

 

3.       Evaluate the rate between delivered and temporarily rejected mail and if it has a significant impact.

This is one of the most important values because it helps you determine if there may be any queuing problem, its magnitude and helps you identify if there’s something you can do to mitigate it. When we talk about mail systems, the worst scenario for performance degradation is when the server accumulates an excessive amount of mails that wastes its connections and doesn’t allow it to deliver valid mail.

The following formula shows you how to obtain this rate:

RESOURCES FOR TEMPORARILY REJECTED MAIL = NUMBER OF TEMPORARILY REJECTED MAIL / NUMBER OF OUTBOUND CONNECTIONS

Now we’ll add more values to our previous example:

NUMBER OF INBOUND CONNECTIONS = 800,000

NUMBER OF OUTBOUND CONNECTIONS = 1,000,000

NUMBER OF MAIL DELIVERED TO INTERNET = 180,000

NUMBER OF PERMANENT REJECTED MAIL = 48,000

Mail rejected because of non-existent mailbox = 42,000

Mail rejected because of non-existent domain = 6,000

RESOURCES FOR PERMANENT REJECTED MAIL = 4.8%

NUMBER OF TEMPORARILY REJECTED MAIL = 850,000

Mail rejected because of timeout or connection rejection = 750,000

Mail rejected for other reasons = 100,000

RESOURCED FOR TEMPORARILY REJECTED MAIL = 85%

DELIVERY EFFICIENCY = 22.5%

With this new data, it is clear now that the poor efficiency of this mail server is because it is using up to 85% in outbound connections that haven’t been successful. This server will greatly improve its performance by mitigating the queuing problem.

At this point, it is mandatory to determine what is the real cause of the queuing problem and evaluate if any modification to Postfix may be made to improve the performance. We’ll use the indicators obtained in Part I to identify what are the reasons for our Postfix to queue mail.

The first in the list, as shown previously in the example’s numbers, is that almost 90% of all queued mail is there because either the connection timed out or it was refused. Both cases can be observed when you filter your maillog for DSN 4.1.1 as shown in the following lines:

 

dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to hayoo.com[208.91.197.26]: Connection timed out)

dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to hootmail.com[64.4.6.100]: Connection timed out)

dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to msn.com.mx[65.55.206.229]: Connection timed out)

dsn=4.4.1, status=deferred (connect to yahoo.com.mx.com.mx[208.87.35.103]: Connection refused)

 

What DSN 4.1.1 means, that unlike permanent rejection because of non-existent domains, here the domain has at least one A public record and because of an SMTP rule, if there’s at least one IP has been obtained but it doesn’t respond to port 25, the mail server should queue mail and retry later until one of the following conditions is met: either the IP eventually responds on port 25 or the lifetime of the mail expires.

Notice carefully the domain names and try to find both MX and A records for each of them:

hayoo.com. No MX record, A record points to 208.91.197.26

hootmail.com. No  MX record, A record points to 65.55.39.10/ 64.4.6.100

msn.com.mx. No MX record, A record points to 65.55.206.229

yahoo.com.mx.com.mx. No MX record, A record points to 176.74.176.178

 

At this point you can assume the user made a mistake when typing the domain names without actually noticing that the resultant domain actually DOES exist in Internet! This condition is the one provoking the queuing problem in Postfix. So, by just mitigating this situation, the performance of  Postfix may be drastically improved.

Even when we’re showing here examples where the typing error is clear, there may be certain domains that actually does exist and receive mail but they are temporarily experiencing technical problems or they have temporarily blocked Postfix’s IP address. If you want to be sure, you have to analyze the Top Ten of the domains that present this condition and confirm that the MX record truly doesn’t exist. In these cases we’re talking about a clear legitimate queuing condition and no action should be required on our end.

For the first cases where the domains no longer or never existed, you can configure Postfix to reject mails destined to recipients of such domains. For the cases where our IP has been blocked you have to find first what is the cause of the problem and solve it.

 

3.1. Identify if the reasons for the temporal rejection may be mitigated by Postfix

The conditions just mentioned will not always be the case for a queuing condition. Now we’ll present other common smtp responses and the way to interpret them:

DSN=4.4.1. There is no MX record, but there is at least on A record that results in queuing

dsn=4.4.1, status=deferred (delivery temporarily suspended: connect to hootmail.com[64.4.6.100]: Connection timed out)

This is the case previously explained where a user has made a mistake when typing the domain name, but the resulting name has at least on A record in a public DNS server. This mail must be queued even when there’s no way to deliver it. As mentioned before, you can mitigate this condition by extracting the Top Ten of the domains that match this condition and reject them since the beginning so Postfix doesn’t process them.

 

DSN=4.4.3. DNS request has timed out, there is no certainty if the domain exists or not.

dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=organik.com.mx type=MX: Host not found, try again)

Unlike the cases already explained, DSN 4.4.3 means there is a problem with the DNS request, so you can tell for sure if the domain exists or not. This error doesn’t end in a permanent rejection because the DNS response is different. Let’s see the following examples:

 

Domain: ksiflkasdf.com

DNS response: *** b.resolvers.Level3.net can’t find ksiflkasdf.com: Non-existent domain

Response Code: 3 (binary value 11)

Domain: organik.com.mx

DNS response: *** b.resolvers.Level3.net can’t find organik.com.mx: Server failed

Response Code: 2 (binary value 10)

When the domain actually doesn’t exist, the DNS server responds with a “Non-existent domain” which corresponds to a RCODE 3 (ResponseCode 3), in this case, the response is interpreted by a mail server with a permanent error.

For the second case, DNS responds with a “Server failed” which corresponds to an RCODE 2. This means the DNS server was not able to complete the domain search operation. Because this implies a network problem, the domain cannot be blamed and because of this a temporal problem is assumed provoking the queuing condition. This is true, because as it happens with all network problems, if you try again later there’s a chance the DNS request may be successful.

Postfix decides to assume whether a permanent or temporal error by checking the RCODE flags in the DNS response packet. If it gets a 3 then an 5XX error is assumed, if it gets 2 it queues. Simple as that.

Because this may represent a legitimate queue condition, it is advisable to obtain the Top Ten of this domains along with an historic for at least one week or more (this will depend on how long mails are being retained within Postfix). If during this time you can’t find a DSN different than 4.4.3, it means Postfix has never been able to deliver mail to such domains so you may block all mails to that domain since the beginning. If you actually find a different DSN for that domain then you can’t assure that those mails will never be delivered, you’ll have to check what the new DSN means and act accordingly. In such cases the main problem may be DNS latency problem, you’ll have to check this services prior to take any actions on Postfix.

In the field, more than 95% of these connections end up being for domains that doesn’t exist, but these values may change from one Organization to another so an act of “due diligence” must persist.

 

DSN=4.7.7. There is something in the mail integrity or the mail server IP that produces the rejection.

dsn=4.7.7, status=deferred (host ff-mx-vip1.prodigy.net[207.115.20.20] said: 451 4.7.7 Excessive userid unknowns from 65.82.31.32 flld103 (in reply to MAIL FROM command)))

DSN 4.7.7 is commonly used in situations like the shown above where the destination assumes our Postfix is performing a DoS attack or a DHA (Directory Harvest Attack) by sending mail to several mailboxes that don’t exist.

A common reason for this is mail generation by automatic systems with old mail lists, in such cases you have to update the lists used by these systems. If no such systems exist in your Organization the problem may be your users have records of people that no longer work in the destination domain, in such case you can send a campaign you all employees to encourage and update of their lists. Obviously, in both cases a Postfix Administrator has limited power to solve the problem as the responsibility relies in someone else. If the problem is getting worst and there’s no support from the Organization, you can obtain the Top Ten list of these mailboxes and block them in Postfix. In Part 3 Recommendations, we’ll show you additional filters and main.cf / master.cf configurations to achieve this.

 

DSN=4.7.1. Sender is not authorized to send mail to that mailbox.

dsn=4.7.1, status=deferred (host mx2.terra.com[208.70.188.176] said: 450 4.7.1 You’ve exceeded your sending limit to this domain. (in reply to end of DATA command))

 dsn=4.7.1, status=deferred (host mail.itc.mx[200.23.53.200] said: 450 4.7.1 <postfix.localdomain>: Helo command rejected: Host not found (in reply to RCPT TO command))

 dsn=4.7.1, status=deferred (host shio.sublimeip.com[85.255.209.100] said: 454 4.7.1 <user@dc-os.com>: Relay access denied (in reply to RCPT TO command))

 dsn=4.7.1, status=deferred (host in1-mtp.messagingengine.com[66.111.4.71] said: 451 4.7.1 <user@ml1.net>: Recipient address rejected: User is over quota, try again later (in reply to RCPT TO command))

 dsn=4.7.1, status=deferred (host msgin.t.facebook.com[66.220.155.13] said: 421 4.7.1 RCP-T4  http://postmaster.facebook.com/response_codes?ip=200.33.74.70#unavailable Recipient account is unavailable (in reply to RCPT TO command))

 

All these errors are examples of DSN 4.7.1 which can take different forms. Generally speaking, DSN 4.7.1 means a situation where the sender (this may be the user or Postfix’s IP address) is not authorized to send mail to that recipient or to the whole domain.

In the first example we see the rejection is made because our Postfix has exceeded the maximum number of mails it can send to that domain either in the same SMTP session or in a limited time window. These errors imply a situation of legitimate queuing where additional Postfix configurations are not necessary to correct this condition. Once the blocking time passes, Postfix will automatically retry the delivery of the remaining queued mails. Obviously, this situation may imply a natural delay in the delivery that may reach 30 minutes or more, however, you can control the time frame when mail delivery is retried, you may even control how many mails are delivered to a certain domain by placing a limit in the number of mails per session but this doesn’t guarantee a delay will occur anyway.

The second example shows a case where Postfix initiates the SMTP conversation by sending the HELO/EHLO command with its own FQDN as argument, this is, Postfix starts by sending a line like this:

EHLO postfix.localdomain

Before accepting the mail, there are some servers that perform a PTR request to validate if the said FQDN actually exist. If the request fails, then the server rejects the connection. This situation can be mitigated by adding the Postfix FQDN to the reverse zone of your domain, this will ensure the Postfix server “exists” in Internet. The logic behind this kind of validations is to increase the probability that the SMTP server is actually receiving a connection from an Internet registered server and not from someone pretending to facebook.com or any other domain because the request is sent directly to DNS associated with the sender’s domain.

The third example shows a scenario where the destination server doesn’t have its relay opened for the recipient’s domain. This may happen when a company hosts its mail domain with mail hosted service. Such services point all its customer’s MX records to its own mail server and configures the relay to accept mails from these domains. For example, suppose hosting service host.com has 5 customers. To provide them a mail service it points all 5 MX records to its own mail server and configures the relay to accept mail for their domains.

Host.com – Registro MX = 10.98.3.4

   Cliente1.com – Registro MX = 10.98.3.4

   Cliente2.com – Registro MX = 10.98.3.4

   Cliente3.com – Registro MX = 10.98.3.4

   Cliente4.com – Registro MX = 10.98.3.4

   Cliente5.com – Registro MX = 10.98.3.4

Now let’s suppose customer5.com doesn’t pay for the service anymore. In this case, host.com closes the relay for that domain but the MX records remains. At this point all mails for customer5.com will keep arriving to IP 10.98.3.4, when the server validates the recipient’s domains it finds out that its relay is closed for that destination so it rejects the mail. Another scenario is where both MX and the relay are removed, however, it can happen that MX records is still being made using DNS or mail server cache, so those servers can still find customer5.com at the same IP. Finally it can also happen that mail servers have a Smart Host configured so all mail for customer5.com always points to the same IP without requesting the records from a DNS server or using cache; such servers have no way to know that customer5.com is no longer host at that IP address. In all these cases you can configure Postfix to reject these mails, however you should take care to monitor them because even when the server doesn’t receive mail at the time, they may be brought online in the future in another IP. If you keep this blocking rule and you don’t verify at least once a month, your users will start to complain because their mails are not arriving.

The fourth examples shows a situation where the destination mailbox exists but it is not able to receive mail at the moment because it has exceeded its storage quota. Let’s suppose an Organization has a fixed mailbox storage limit of 100MB, once users reach this limit, the mailbox won’t be able to accept any additional mail. In such cases Postfix receives a temporal error because the mailbox may get any additional mailbox space at any time and so those mails may be accepted in the future. This is another legitimate queuing situation where no Postfix changes are required. If this is really degrading delivery performance you may reduce the lifetime of your mails but this won’t solve the root problem, though.

The fifth and last example shows a situation where the destination mailbox is not available at the time of delivery. There are many reasons for this: the server is under maintenance, the mailbox is disabled because the user no longer works there or has deleted his account, there is an audit on the mailbox, there’s an investigation in place on the mailbox owner, there are LDAP maintenance tasks in place, etc. In this case it is not advisable to apply any configuration changes on Postfix because you can’t be certain if the mailbox does or does not exist, it is better to queue this kind of mails and monitor them to see if they can actually get delivered.

If you need an interpretation of any other SMTP responses codes besides the ones already shown, we invite you to check our post on Extended Status Codes (SMTP).

 

4.       Evaluate the rate between delivered and deleted/quarantined mail by filtering policies.

This value provides security/filtering policies efficiency. Its value can be obtained with the following formula:

FILTERING EFFICIENCY = (INBOUND CONNECTIONS – PERMANENT REJECTED MAIL – TEMPORARILY REJECTED MAIL – MAIL DELIVERED TO INTERNET) / INBOUND CONNECTIONS

In the example we have been studying, the rate would be:

INBOUND CONNECTIONS = 800,000

PERMANENT REJECTED MAIL = 48,000

TEMPORARILY REJECTED MAIL = 450,000

MAIL DELIVERED TO INTERNET = 180,000

FILTERING EFFICIENCY = (800,000 – 48,000 – 850,000 – 180,000) / 800,000 = 15%

In this case, you should have noticed we are not using the number of connection for TEMPORARILY REJECTED MAIL, this is because as explained on Part I, when you have a temporarily rejection, the number of connections is always higher than the number of mails, if we were to evaluate connections then the results wouldn’t be correct and the rate may even a negative number.

The obtained value indicates Postfix is stopping up to 15% of all incoming mails by applying filtering/security policies. This is a good value but you should also notice this value may be altered, this is because in our case the number of temporarily rejected mails is not correct. Let’s explain this a little better:

Let’s suppose Postfix doesn’t have any queued mail and at a certain time the new 800,000 arrive. From these mails, 48,000 are deleted by a permanent rejection, 450,000 by a temporarily rejection and 180,000 got delivered to Internet. This means only 122,000 didn’t get through any of these processes and do not appear in the outbound connections. This is a little obvious because by being deleted they are getting out of normal mail flow of Postfix, as they had just disappeared.

Now let’s suppose that by the time the new 800,000 mails arrive, Postfix already has 300,000 queued mails trying to get out. At this point we can’t tell how many of the 122,000 mails correspond to those already inside Postfix and how many to the ones deleted by policies.

If you need to find the exact value for FILTERING EFFICIENCY, then you could get the list of 800,000 Internal ID’s that correspond to new mails, then get the list of ID’s of mails rejected and delivered to Internet. Now it’s just a matter of getting the difference between both sets and you’ll end up with the exact number.

If we go back to the beginning of this Post about the DELIVERY EFFICIENCY, remember there’s no adequate value, however we mentioned that an 80% may be used as a reference. This number has to be obtained as we explained here, this means, the number of temporarily rejected mails has to be exactly the number of mails and not the number of connections, if you use this last value the result won’t be accurate and the interpretation may be wrong.

For this value to be considered as good or not, it is advisable to obtain the rate at different points, at least once a month. This way you can truly evaluate what the real behavior of your Postfix is and you may evaluate accurately and objectively if the applied recommendations are making a positive impact and how they are helping to improve the performance.

 

 

5.       Identify if there is any significant delay in mail delivery.

The last value to obtain the complete context of your data is the time mails are getting delivered when a permanent or temporal rejection occurs. You may obtain the speed at which mails are getting delivered to Internet, but if the last values are adequate you may assume mails are getting delivered in an adequate speed as well.

In order to know if there is any significant delay we’ll take timing indicators for the example we’ve showed before:

PERMANENT REJECTED MAIL

Average lifetime before rejection = 8 hours

Maximum lifetime before rejection = 10 days

Minimum lifetime before rejection = 60 seconds

TEMPORARILY REJECTED MAIL

Average lifetime before rejection = 6 days

Maximum queuing time = 14 days

Minimum queuing time = 25 seconds

We clearly see that in both cases the time mail is waiting inside Postfix before being rejected is very high, This represents a serious problem and confirms the poor performance of Postfix. The minimum and maximum times help us locate an approximate time band in which most mails are.

For permanent rejection we see most of the mails are waiting between 2 and 16 hours before being rejected and generate the according NDR. For temporal rejection we see most of the mails are waiting between 5 and 7 days which is certainly causing the excessive waste of connections and storage. It is also creating a bottle neck that doesn’t allow legitimate mail to be delivered.

As a reference, an adequate average value for these indicators must be ideally below 60 seconds. Depending on each Organization you may have a tolerance up to 10 minutes when the devices are overwhelmed but even in those cases is necessary to propose adjustments because sooner or later you’ll face a critical queuing problem

With this information we may conclude the main problem for the poor performance of this Postfix is because of an excessive mail queue and a very slow rate of delivery. Just imagine a single mail can be delayed up to 14 days! At the actual rate, it would take months to free the queued mail and this if no more new mail is arriving.

 

 

6.       Summary

In this post we have evaluated each of the indicators obtained in Part I of this series. As a result, we now have a diagnosis that shows if your Postfix  / IMSVA has a good performance, if it is efficient applying policies and most of all if it is delivering mail at an adequate speed for your Organization.

We also reviewed how to interpret several SMTP response codes that will help us in the next post to recommend the appropriate changes to Postfix. This will allow you to get the most of your Postfix / IMSVA server.

As a final summary of the example we reviewed we have the following highlights.

  1. The DELIVERY EFFICIENCY rate is very low.
  2. The main problem is a massive queuing mostly because Postfix has many mails that go to non-existent domains. Fortunately these conditions may be mitigated by some configuration changes in Postfix.
  3. The queuing timing is within non acceptable ranges because one mail may present a delay of almost 14 days and this situation doesn’t allow Postfix to process new mails on time.

PART I. Postfix Maillog Performance Indicators

PART II. Maillog Indicators Interpretation

For more information we invite you to visit our Postfix section. If you want to learn more about the rules to create, transmit or process an email you can check out our publication on The SMTP Protocol Fundamentals.

Remember to send us your questions and comments to our Twitter account: @redinskala where you’ll find more information and security tips.

Thanks for your visit!

One thought on “Maillog Analysis for Postfix and IMSVA (Part II)

  1. Pingback: Maillog Analysis for Postfix and IMSVA (Part I) | RedinSkala

Comments are closed.