QuicksearchShow tagged entriesCategories |
Sunday, March 4. 2012Logging best practicesIntroductionThis document explains some best practices for application error/debug logging. Application logging is an important "tool" to help identifying problems. If done correctly, and if an infrastructure is in place that allows events to be put into context, it's easier to debug situations and thus makes it easier to solve problems sooner. While out of scope for this document, I'd like to point out the following tools that can make that possible:
Different logging meaningsLogging can mean different things and logging can have different purposes. The logging this document refers to is application and semantic logging. Application logging (also called "debug" or "error" logging) is intended to keep track of the "low level" application behaviour. Application logging is used to log events, that allows for debugging a problem within an application or to monitor application health. In contrast, a different form of logging could for example be Google Analytics. Tracking (or "logging" actions of) visitors of a website, this type of logging is very different and not something this document refers too. Why do we logIn order to understand how to log, it must first be clear what we try to accomplish with application logging. To create an audit-trailIn case of an error, it needs to be clear what is going on. This can be done by keeping an audit-trail during the request. It can give more context about what is going on. It's also clear what parts did or didn't happen prior to a (problematic) situation. To track application warnings and errorsTo identify errors, we want the application to spill out warnings and error event messages. These can be investigated and hopefully solved. In the case of PHP, log messages of the priority: deprecated, strict or notices should be considered errors as well, but more on this later. To track down bugsDuring development, logs can help developers and software testers to track down flaws within an application. To track down performance problem areasProduction can reveal (performance) problems, that simply don't occur during the development phase and that didn't get caught during the stress/performance tests. When we're logging detailed timing information, certain performance bottlenecks can be more easily found or even prevented before they got too serious. Some logging examplesWhile it's obviously not set in stone on how logs should be formatted, there are some recommendations about what information to include in a log. Having that said, it's better to pick a standard than to invent your own!
Something like this would be a good message: [2012-03-02T20:20:49.003+01:00][43gg84][info] Bootstrapping application (v 2.1b) To explain the 4 parts:
Depending on the level and the purpose of the event. It's very crucial to include useful information. There is nothing more frustrating then the following (bad) example: [Thu, 01 Jan 1970 00:00:00 +0100] Error: "" (This is an actual log message I encountered, when I was debugging an error in a production environment. The actual problem was that a Beanstalkd server wasn't running. As soon as it was started, everything worked just fine.) In case of an error, it's very helpful to have a stack-trace. This is however "expensive" in the sense that it's often a tremendous chunk of data. So use it, but use it wisely. Throwing stack-traces everywhere just for the hek of it, is a bad idea. To give you a tip on how to handle expensive messages, take a peek at the chapter "A smart logging implementation". An example of a successful bootstrapLet's continue with a more positive setting. The following is a log output and it's just here to get a feeling about the flow and the log format: [2012-03-02T20:20:49.003+02:00][43gg84][info] Bootstrapping application (v 2.1b) [2012-03-02T20:20:49.013+02:00][43gg84][info] Request cycle startup [2012-03-02T20:20:49.093+02:00][43gg84][info] User session found, user ID: 13414564 [2012-03-02T20:20:49.123+02:00][43gg84][info] Requested URI '/fu/bar/index' [2012-03-02T20:20:49.133+02:00][43gg84][info] Requested forwarded to URI '/fu/bar/overview' [2012-03-02T20:20:49.273+02:00][43gg84][info] Sending HTTP headers [2012-03-02T20:20:49.283+02:00][43gg84][info] Sending HTTP Body [2012-03-02T20:20:49.313+02:00][43gg84][info] Request cycle shutdown [2012-03-02T20:20:49.323+02:00][43gg84][info] Request cycle completed in 0.320 seconds. The log is fairly self explanatory. We know what version of the application is running, what user we're dealing with, the URL's we end up and how long the requests takes. An example with stack-traceThe following example has stack-trace information. As you can see, the level for the stack-trace and the error message is different in this example. This is how it should be, since the type of information is different. PHP does it differently and appends the stack-trace information on the same level as the event that triggered it, namely "error". [2012-03-02T20:20:49.003+02:00][43gg84][info] Bootstrapping application (v 2.1b)
[2012-03-02T20:20:49.013+02:00][43gg84][info] Request cycle startup
[2012-03-02T20:20:49.123+02:00][43gg84][info] Requested URI '/fu/bar/index'
[2012-03-02T20:20:49.273+02:00][43gg84][info] Sending HTTP headers
[2012-03-02T20:20:49.283+02:00][43gg84][erro] Cannot modify header information – headers already sent by X on line Y
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] Stack trace:
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 1. {main}() /tmp/fu/bar/httpdocs/index.php:0
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 2. Zend_Application->run() /tmp/fu/bar/httpdocs/index.php:31
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 3. Zend_Application_Bootstrap_Bootstrap->run() /tmp/fu/bar/library/Zend/Application.php:366
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 4. Zend_Controller_Front->dispatch() /tmp/fu/bar/library/Zend/Application/Bootstrap/Bootstrap.php:97
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 5. Zend_Controller_Dispatcher_Standard->dispatch() /tmp/fu/bar/library/Zend/Controller/Front.php:954
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 6. Zend_Controller_Action->dispatch() /tmp/fu/bar/library/Zend/Controller/Dispatcher/Standard.php:295
[2012-03-02T20:20:49.293+02:00][43gg84][dbug] 7. IndexController->indexAction() /tmp/fu/bar/library/Zend/Controller/Action.php:513
[2012-03-02T20:20:49.313+02:00][43gg84][info] Sending HTTP Body
[2012-03-02T20:20:49.323+02:00][43gg84][info] Request cycle shutdown
[2012-03-02T20:20:49.333+02:00][43gg84][info] Request cycle completed in 0.330 seconds.
An error almost every PHP developer came in contact with, the "headers sent" error. The stack-trace shown here is very small, when using debug_backtrace() it's possible to specify not to dump object information (by passing FALSE as argument). While object information can provide useful information, it often generates more noise then it's worth. If objects are debugged properly, there is no need to dump the object information. This saves output, noise and improves debug productivity. Of course there are always cases where you might want this, when using with external objects for example, use it whenever you feel it's applicable. Important events to logNot everything is important to be logged, as the next chapter describes. In this chapter, however, I'll cover a couple of things that are worth being logged. The time it took to finish somethingThis gives a good indication of health, why was this request slow (or too fast..).
Exceptions and stack-tracesAn (informative) exception and stack-trace can be very helpful. But writing this is surprisingly difficult, especially when you control application flow or often re-throw Exceptions. Also exceptions of type "GeneralException" or worse just "Exception" is not helping either. If you, however, do this correctly, stack-traces can be very informative. At least you know how you got where it went wrong and where it finally went wrong. SessionsWhenever possible it's important to know who caused what. This is possible with a session identifier. It could be as simple as an IP or a more fancy UUID, but having a session can make all the difference. IP addresses are often bad to use, but it depends on the implementation. It's often better to use an IP than it is to use no form of session identifier at all. When you have hundreds or even thousands of messages being logged every second, it's nice to know which messages belong to that single request that made things go boom. Version numbersI've experienced it too often. A bug happens on server X that doesn't happen on server Y. The reason? Different software versions... It should never happen, but it simply does. If you include version numbers of libraries you include in a request, it can keep your search brief. What not to logPerhaps just as important as the question what to log, is the answer to the question, what not to log? Logging is a way to identify problems, but it should not be overused. Logging for the sake of logging is not the way to go. After all, too much information makes it much harder to find the problem. Only crucial aspects of an application should be logged, such as important decisions. It's hard to define a ratio or to exactly state what to log. I believe that, just as with refactoring, it should be done when needed. So when a bug is being traced, introduce log statements that help identify the problem. So that when it occurs again, it's easily identified. But obviously you'll also write a test for this bug, so it won't happen again anyway Also don't include irrelevant information. I've seen plenty of implementations that dump all the environment variables, entire request dumps, etc. This clutters the search and is often not helping. Be conscious about what you do. The rule of thumb I usually follow is this. Given the position within an application, would this message be meaningful? If it's not, don't log it. Part of the defensive coding methodology, is to test for negative outcomes also. Log when everything went OK and log when it failed. This way you know that you reached that point. If you follow this in your logging tactic, chances are you'll produce useful logs! What to do with logged eventsYou're still reading, I am impressed! When a certain erroneous event is logged, it should be investigated and fixed. Logs must contain useful and up-to-date information. If a certain event always occurs, but is never fixed, it causes a certain numbness to the people who read the logs. Rendering the logging potentially useless. This is something seen in many ways in our lives and not just something that occurs during software development. Dealing with exceptionsExceptions should only be logged in detail where they are finally caught, not in between. Creating a log message within an exception object is semantically incorrect. Such logic and functionality has no place inside an exception object. When an exception is re-thrown, it's fine to log with informational level that an exception is re-thrown, but the exception details should be caught where it makes sense. When an exception is caught and logged, the following details are important parts that should be logged: The stack-trace, error code and description. Since Exceptions are "exceptional", they can always be considered an error. This might seem obvious but quite often an exception is unnecessarily raised. Exceptions should not redirect application flow, but simply indicated that something exceptionally wrong occurred. Obviously only you, the author, knows what's exceptional and what isn't. In the following example, I chose to not make it exceptional whenever a user couldn't be found in the system. // ..
// Fetching the user by the received ID.
try {
$user = $service->findById( $id );
} catch (SpecificServiceException $e) {
// Only the specific exception "SpecificServiceException" is being caught here.
// Other possible exceptions are caught later, so they are not logged here.
$this->getLogger()->logError( $e->getMessage() );
}
// Sorry, no data found
if ( ! $user instanceof User) {
return false;
}
// ..
Dealing with PHP errorsPHP raises it's own errors when something happened, ranging from E_FATAL all the way down to E_STRICT and also their "user-land counterfeits, E_USER_*". These errors should either be caught, using a custom error-handler or should be configured to end up at the same syslog daemon. The latter can be done easily by setting "error_log = syslog" in php.ini. On all environments, with the exception of the development environments, the "display_errors" and similar settings should be set to off. For development this can be set to on, depending on the developer and the task at hand. The "error_reporting" directive, however, should be set to include everything. But more on this in the following chapter. As a tip: When configuring PHP error reporting from outside the PHP runtime, in e.g. Apache or shell-scripts, integer values need to be used. The undocumented, but nice, shorthand value "-1" can be used to set the "error_reporting" to E_ALL, including E_STRICT. Dealing with E_STRICT and E_DEPRECATEDIn PHP there are a couple of event levels. The two levels E_STRICT and E_DEPRECATED are two that, although being important during development!, have no reason to be shown on a production environment. If these errors are raised during development and not fixed, it has to be a specific choice. Often involving vendor / external libraries, still relying on something outdated. If displayed on production, a lot of noise will be generated that adds virtually no additional value to the logs. One last point on E_STRICT violations is that they often can, and should, be fixed during development. And don't forget, you can contribute patches to vendor libraries to fix their code. Dealing with E_NOTICENotices could be considered PHP's Achilles' heel. They are often not even logged and are often an indication of the health of an application. If many notices are being thrown, raise yourself, because you might be in for a bumpy ride. I won't go deep into the matter, since it deserves it's own article. To summarize however: Notices are ERRORS, they indicate fault in an application. They are not "debug" or "informational" type log messages, they are errors in your application and should be fixed. Dealing with the shut-up operatorThe "Error Control Operator" ("@") is exceptionally (pun intended) bad and frequently not needed. In most cases I've seen, they are used because of an extremely lazy developer. Using this operator will hurt you, or the person after you, eventually. I strongly recommend not using it, unless you really need to. Having that said, there are valid use-cases for it. When you use it, be sure to log the result, so that the event is being logged. This would typically be a debug type message. Different message levels and when to use themIf you look at, for example, Zend framework. You'll notice they have 8 error levels, ranging from integer value 0 to 7, 0 being the highest priority. Personally I find it a bit much and I think we can already do more then enough with 5 levels:
EmergencyIn case of total catastrophe the EMERG constant should be used. In short: The system is completely unusable. Immediate action is required. Examples of this could be:
ErrorIn case of an application error the ERR constant should be used. It indicates that something is going on that results in data-loss, incomplete requests or any other scenario where something is seriously wrong. People should be notified and actions should be taken. Examples of this could be:
WarningsIn case of a warning, the WARN constant should be used. It indicates that something strange is going on and should be investigated and fixed in the next release. The system still completed correctly and doesn't require immediate attention, because then it should have been an error. Examples of this could be:
InformationalIn case of an information message the INFO constant should be used. Nothing went wrong and the informational message simply informs you what happens in the application, approximately. Examples of this could be:
DebugIn case of a debug message, the DEBUG constant should be used. Nothing went wrong per se and the debug message is simply informing you exactly what is going on. Debug messages are basically informational messages with much more detail and are allowed to slow down an application. Debug messages are used to help debugging applications. This level is, for obvious reasons, only recommended on development type environments. Read the chapter "A smart logging implementation" about a suggested implementation that gives you debug information even in production environments. Examples of debug messages could be:
** - It is a slippery slope, but when dealing with "to be expected" exceptions from vendor libraries or network related requests that are handled correctly even after exceptions. One may decide to catch them and put only a warning log entry here. In other cases, exceptions should be "exceptional" and not a way to direct an application flow. A smart logging implementationAs stated before, debug level debugging should not be enabled on a production server since they are allowed to slow the system down. They do, however, offer great detail and could be of great service when dealing with a production environment. Your error-handler could be programmed in such a way, that all events are kept in memory. As soon as the life-cycle of the request is over, all messages are written to the error-handler. Some logic as the following example, could give you a tradeoff solution. Giving you full detail in case of an error, with the cost of more memory usage. To put it in "pseudo" code: // If we have errors, include debug events in our logs. Else don't include them.
if ($errorEventCount > 0) {
$allEvents = array_merge(
$criticalEvents, $errorEvents, $warningEvents, $infoEvents, $debugEvents
);
} else {
$allEvents = array_merge(
$criticalEvents, $errorEvents, $warningEvents, $infoEvents
);
}
ConclusionIn a perfect world, log messages shouldn't be needed. But we both know that we are far from that reality. Logging an event in an application is easy. But doing it properly, so that the next developer can still use it, is hard. Logging within an application should be as important as good design and it should be taken just as seriously. Thanks for reading! Comments
Display comments as
(Linear | Threaded)
Hi Dynom,
Great article, any thoughts on integrating or referencing other logs into the application? For easy debugging. Say the application dependent systems logs? (Dmesg log, MySQL slow-query log, webserver log)
All I can say is yes!
Most, if not all, can be logged to syslogd. If that, however, fails. Tools like Logstash can pick it up from any file from your system and pass it trough to your dashboard of choice. I think that adding relevant logs can be very helpful indeed. Be careful about cluttering your dahboard though, too much information won't necessarily be positive. There are, however, great tools that allow you to filter unneeded messages out on your dashboard. Such as the costly Splunk or the awesome Graylog2.
So where would you say is the best place to put the actual log statements. if you call the logger instance in the class method you have to pass your logger into all your classes causing tight coupling, or should you refrain from doing any logging in the class and place the log statements amongst all the wiring making the flow of logic more difficult to read?
Tight coupling is prevented by a matter of abstraction. How you implement it isn't really the subject of this article, but to give you another example:
If you don't (or can't) bubble up, you could make it readable by doing something like this: try { .. do stuff } catch (SpecificServiceException $e) { LogException($e); .. do other stuff } This requires one additional line and your logging information stays nicely encapsulated. I wouldn't necessarily put it inside your domain models, but in your application logic. |
Calendar
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Interesting Finds: March 4, 2012
Tracked: Mar 04, 12:42