Installing R on Debian – 64-bit?

I’m installing R & R-studio on the old Dell in preparation for a forthcoming university module.

In a previous post, I talked about converting the old WinXP Dell 5100 to Linux using Unetbootin.

However, I’ve never been sure if the chip in the Dell is so old as to not be able to handle 64-bit programs. I’d rather install the 64-bit version than the 32-bit if I’m drawing on large(ish) datasets from my MongoDB cluster.

  • Open a terminal window
  • $ lscpu
Architecture:          i686
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                2
Thread(s) per core:    1
Core(s) per socket:    2
CPU socket(s):         1
Vendor ID:             GenuineIntel
CPU family:            15
Model:                 4
Stepping:              7
CPU MHz:               2792.942
L1d cache:             16K
L2 cache:              1024K

This isn’t entirely clear to me – looks like I can run both?

$ getconf LONG_BIT
Returns ’32’

$ arch
Returns i686, which upon Googling looks like it’s an old, only 32-bit compatible chip. Shame! I guess it doesn’t really matter too much given the machine has a max of 4GB of RAM anyway

$ top
$ free
$ cat /proc/meminfo
All show memory information

The installation itself was simple
$ sudo apt-get install r-base r-base-dev
to start R
$ R
To quit
q()

Sucking up some Tweets, analysing/processing in R and storing them in MongoDB could  be a neat experiment.

JeffGentry’s twitteR package makes searching Twitter easy:

># load the package
>library(twitteR)
># get the n most recent tweets mentioning ‘@raspberry pi’
>delta.tweets = searchTwitter(‘Raspberry Pi’,n=n)

See more here

When I open R and try to<install.packages(‘twitteR’)>
I get  argument ‘lib’ is missing: using ‘/home/stuart/R/i486-pc-linux-gnu-library/2.11’
I’m looking here for an answer….

http://linuxishbell.wordpress.com/2010/12/10/install-r-package-without-root-accesson-linux/
http://cran.r-project.org/bin/linux/debian/README

Am also going to install RStudio IDE. Follow the instructions here

Advertisements

Distribution, Replica sets, Master-Slave

I’m just getting to grips with distributed database terminology pp130 in “MongoDB The Definitive Guide”

Master-Slave Replication

Master-slave replication  can be used for

  • Backup
  • Failover
  • Read scaling, and more

Read scaling could be really interesting from a BI consumption point-of-view eg a peak of user traffic to the DW portal at the beginning of the day.

In MongoDB the most basic setup is to start a master node (in my case the Dell 5100)  and add one or more slave nodes (in my case, the Raspberry Pis) Each of the slaves must know the address of the master.

To start the master run: mongod –master

To start a slave run: mongod –slave –source master_address
– where master_address is the address of the master node just started. This is where my previous post on fixing a static IP comes in handy.

First, create a directory for the master to store data in and choose a port (10000)

$ mkdir -p ~/dbs/master
$ ./mongod --dbpath ~/dbs/master --port 10000 --master

Now, set up the slave(s) choosing a different data directory (if on same/virtual machine) and port. For any slave, you also need to specify who the master is

$ mkdir -p ~/dbs/slave
$ ./mongod --dbpath ~/dbs/slave --port 10001 --slave --source localhost:10000

All slaves must be replicated from a master node. It is not possible to replicate from slave to slave.

…As soon as my 32MB SD cards arrive in the post and I install MongoDB on the remaining 4 R PIs I will give this a go!

Replica Sets

replica set is the same as the above but with automatic failover. The biggest difference between a M-S cluster and a replica set is that a replica set does not have a single master.

One is ‘elected’ by the cluster and may change to another node if the then current master becomes uncontactable.

 

There are 3 different types of nodes which can co-exist in a MongoDB cluster

  1. Standard – Stores a complete, full copy of the data being replicated, takes part in the voting when the primary node is being elected and is capable of being the primary node in the cluster
  2. Passive – As above, but will never become the primary node for the set
  3. Arbiter – Participates only in voting. Does not receive any of the data being replicated and cannot become the primary node

Standard and passive nodes are configured using the priority key. A node with priority 0 is passive and will never be selected as primary.

I suppose in my case if I had decreasing SD cards, or a mix of model A (256 MB) and model B (512 MB) Pis, then I could set priorities in decreasing order so the weakest node was never the master and always selected last/lowest priority.

Initializing a set (pp132)

Day one!

Today I picked up a new laptop & installed various bits needed for my MSc project. The project aim (at this point in time, subject to change!) is to build a scaleable BI solution, using a noSQL DB, commodity hardware and open source tools for around £100.

Acting as 4 nodes (512MB RAM, 4GB SD memory)

I haven’t yet thought about what data I may end up processing, but am drawn to unstructured documents,

I’m planning on using a document database ‘MongoDB’ and the ‘topography’ of the ‘cluster’ is as follows

  • 1x Samsung 3-series laptop, windows 7, 6GB RAM, Pentium 1-5
  • 1 x Dell 5150, WinXP (showing its age, probably around 7 years) 4GB RAM, some kinda processor!
  • 4 x Raspberry Pi 512MB all with 4GB HD cards
commodity hardware - apart from the new laptop

raspberry pi, an old knackered dell desktop and a new laptop.

So far, I’ve installed

  • MongoDB mongodb-win32-x86_64-2008plus-2.2.1
  • Java
  • Tableau for data viz
  • Pentaho for ETL (& more possibly)
  • Jaspersoft iReport for report authoring (although may use Tableau if a suitable db connection can be found, or Pentaho)