MongoDB’s “Aggregation Framework”

In my earlier post I built up a very simple report using the MongoDB connector in Jaspersoft iReport.

Today I noticed you can natively query MongoDB from within Jaspersoft. See more here

Some examples;

In SQL:

SELECT *
  FROM users
  WHERE age > 25
   AND   age <= 50
  ORDER BY age

When using Mongo:

{
   collectionName : 'users',
   findQuery : { 
        age: { $gt: 25, $lte: 50 } 
   },
   sort : {
        'age': 1
   }
}

See: http://docs.mongodb.org/manual/reference/sql-comparison/

An examples of using MongoDB’s ‘Aggregation Framework‘ which I think as pre-computed views/aggregations/filtered results sets

In SQL:
SELECT cust_id, ord_date, SUM(price) AS total
FROM orders
GROUP BY cust_id, ord_date
HAVING total > 250
ORDER BY total DESC

When using Mongo:

{
   runCommand: {
        aggregate : 'orders',
        pipeline : [
                {
                        $group : {
                                _id : { cust_id: '$cust_id', ord_date: '$ord_date' }
                                total: { $sum : '$price' }
                        }
                },
                { 
                        $match: { 
                                total: { $gt: 250 } } 
                        }
                },
                {
                        $sort : {
                                total : -1
                        }
                }
        ]
}

This got me curious about the aggregation frameworkHere’s another couple of examples of how it might be applied.

Each document in this collection has the following form:

{
  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "pop": 5574,
  "loc": [
    -74.016323,
    40.710537
  ]
}

In these documents:

  • The _id field holds the zipcode as a string.
  • The city field holds the city.
  • The state field holds the two letter state abbreviation.
  • The pop field holds the population.
  • The loc field holds the location as a latitude longitude pair.

To Calculate States with Populations Over 10 Million

db.zipcodes.aggregate( { $group :
                         { _id : "$state",
                           totalPop : { $sum : "$pop" } } },
                       { $match : {totalPop : { $gte : 10*1000*1000 } } } )

Map/Reduce Queries in MongoDB – for relational dummies (like me!)

Thanks to my ex-colleague and Genius Dina Mohammad for sharing this with me.

QueryMongo.com

QueryMongo.com

And an example of the different query constructs in SQL vs NoSQL worlds, although the example uses Map/Reduce rather than native querying. I know which I prefer!

SELECT person, SUM(score), AVG(score), MIN(score), MAX(score), COUNT(*)
FROM demo
WHERE score > 0 AND person IN (‘bob’,’jake’)
GROUP BY person;

db.demo.group({
“key”: {
“person”: true
},
“initial”: {
“sumscore”: 0,
“sumforaveragescore”: 0,
“countforaveragescore”: 0,
“minscore”: 0,
“maxscore”: 0,
“countstar”: 0
},
“reduce”: function(obj, prev) {
prev.sumscore += obj.score;
prev.sumforaveragescore += obj.score;
prev.countforaveragescore++;
prev.minscore = Math.min(prev.minscore, obj.score);
prev.maxscore = Math.max(prev.maxscore, obj.score);
prev.countstar++;
},
“finalize”: function(prev) {
prev.averagescore = prev.sumforaveragescore / prev.countforaveragescore;
delete prev.sumforaveragescore;
delete prev.countforaveragescore;
},
“cond”: {
“score”: {
“$gt”: 0
},
“person”: {
“$in”: [“bob”, “jake”]
}
}
});

Basic Shell Commands – MongoDB

 

Command Function
Show dbs Show database names
Show collections Show collections in current database
Show users Show users “ “
Show profile Show recent system.profile entries w. time > 1ms
Use <dbname> Set current DB to <db name>
db.foo.find() List objects in the collection
db.foo.find( { a : 1 } ) List objects in the foo DB where a ==1

MongoDB Semantics

Now, I need to begin losing my relational vocabulary and start talking Mongo !

Some useful additional background here:
The MongoDB query language is not SQL, but 10gen describes it as a simple, expressive language with a straightforward syntax for efficient querying. Examples of simple query statements include “sum,” “min,” “max,” and “average.” These sorts of operators would be familiar to any database veteran or analyst, and they’re applied in a real-time data-processing pipeline that delivers sub-second performance, according to 10gen.

Other available query statements include “project,” which is used to select desired attributes and ignore everything else. “Group” lets you combine results with desired attributes. “Match” is a filter than can be used to eliminate documents from a query. “Limit,” “skip” and “sort,” are statements used in much the same way they’re used in SQL: to limit a query to a desired number of results, to skip over a given number of results, and to sort results alphabetically, numerically or by some other value.

SQL Terms/Concepts MongoDB Terms/Concepts
database database
table collection
row document or BSON document
column field
index index
table joins embedded documents and linking
primary keySpecify any unique column or column combination as primary key. primary keyIn MongoDB, the primary key is automatically set to the _idfield.
aggregation (e.g. group by) aggregation frameworkSee the SQL to Aggregation Framework Mapping Chart.

And query operators

SQL MongoDB
WHERE $match
GROUP BY $group
HAVING $match
SELECT $project
ORDER BY $sort
LIMIT $limit
SUM() $sum
COUNT() $sum
join No direct corresponding operator; however, the $unwind operator allows for somewhat similar functionality, but with fields embedded within the document.

+ Some example query statements

SQL                                                           MongoDB

SELECT * FROM users

db.users.find()
SELECT *
FROM users
WHERE status = "A"
db.users.find(
    { status: "A" } )
SELECT *
FROM users
WHERE status = "A"
OR age = 50
db.users.find(
    { $or: [ { status: "A" } ,
             { age: 50 } ] } )
SELECT *
FROM users
WHERE age > 25
AND   age <= 50
db.users.find(
   { age: { $gt: 25, $lte: 50 } } )

And some really helpful analytical scenarios here

Patterns