Stuart Hayes: Attempting to build a BigData BI solution on extreme commodity hardware!
This blog was begun to serve as a place to brain-dump of thoughts and interesting things I’ve found on the web, in books, through my own efforts – a permanent, searchable whiteboard, if you like. Something I can search and refer back to when i begin my MSc project write-up.
Please don’t copy & use any notes on here without permission of the author – I try to link to where i sourced the content. I encourage you to support the vendors & authors I blog about (if they meet your requirements).
The content will be loosely-structured and fall into the categories below.
#Mongodb #Database cluster #Sharding
#R #Data #Analytics
#University of Dundee #MSc Business Intelligence
#Data models, #schema-free
As part of my MSc in Business Intelligence @ The University of Dundee, I am attempting to build a cheap DB cluster / decent BI solution on cheap/legacy/commodity hardware for around £300 (although I seem to be spending more and more on cables to connect it all up!)
The topography of the cluster is one really old Dell Dimension desktop running winXP (though I plan to Linux it!), a new Samsung laptop running Win7 and
3 4 5 Raspberry Pis (ModelB, 512MB, 4GB SD) running Raspbian Wheezy (Linux).
I may buy a couple more RPIs (if i can figure out how to get them working(!) as they’re about £25 each (plus ethernet cables, SD cards etc)) and see how that improves cluster performance (speed, ACID/BASE possibilities).
My plan is to get some data into MongoDB using Pentaho Data Integration (‘Kettle’).
Then Figure out how to shard it across the nodes (and the performance implications) and then produce some nice reports/visualisations, drawing data out of MongoDB into a reporting tool (poss Jaspersoft iReport). I would like to use some unstructured documents (bids/tenders/contracts/court proceedings etc) as my dataset and try to boil up some interesting data using map/reduce on top of MongoDB.
I’d really appreciate any help you may be able to give, as this is a steep learning-curve.