As i get closer to operationalising my cluster. I’m beginning to think about monitoring statistics. After all, a large part of my interest is how tweaking the architecture and admin settings can influence performance (primarily reads-per-sec and writes-per-sec)
There are some useful basics and more sophisticated SaaS offering, including MMS which I mentioned here, plus some commercial, paid-for offering like Server Density
The dbStats data is accessible by way of the dbStats command (db.stats() from the shell). This command returns a document that contains data that reflects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specific database. This output also allows you to compare utilization between databases and to determine average document size in a database.
The collStats data is accessible using the collStats command (db.printCollectionStats() from the shell). It provides statistics that resemble dbStats on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.
Generally, problems belong to one of three classes
Degraded performance is typically a function of the relationship between (1) the quantity of data stored in the database, (2) the amount of system RAM, (3) the number of connections to the database, and (4) the amount of time the database spends in a lock state.
Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that it’s time to add additional capacity to the database.
MongoDB uses a locking system to ensure consistency; however, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock.
Because lock related slow downs can be intermittent, look to the data in the globalLock section of the serverStatus response to asses if the lock has been a challenge to your performance. If globalLock.currentQueue.total is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might effect performance.
If globalLock.totalTime is high in context of uptime then the database has existed in a lock state for a significant amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.
Because MongoDB uses memory mapped files to store data, given a data set of sufficient size, the MongoDB process will allocate all memory available on the system for its use. Because of the way operating systems function, the amount of allocated RAM is not a useful reflection of MongoDB’s state.
The following figure tries to present how the various components of a MongoDB node (Disks, File System, RAM) interact to provide access to the database
While this is part of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is sufficient for the data set. Consider memory usage statuses to better understand MongoDB’s memory utilization. Check the resident memory use (i.e. mem.resident:) if this exceeds the amount of system memory and there’s a significant amount of data on disk that isn’t in RAM, you may have exceeded the capacity of your system.
Also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of system memory, some operations will require disk access page faults to read data from virtual memory with deleterious effects on performance.
Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults value in theserverStatus command. This data is only available on Linux systems.
Alone, page faults are minor and complete quickly; however, in aggregate, large numbers of page fault typically indicate that MongoDB is reading too much data from disk and can indicate a number of underlying causes and recommendations. In many situations, MongoDB’s read locks will “yield” after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and in high volume systems this also improves overall throughput.
If possible, increasing the amount of RAM accessible (impossible on a Raspberry Pi – but adding another* for £30++ quid is feasible!) to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a sharded cluster and/or adding one or more shards* to your deployment to distribute load among mongod instances.