Building Microservices

Reading this

Ch1. Microservices
Ch2. The Evolutionary Architect
Ch3. How to Model Services
Ch4. Integration
Ch5. Splitting the Monolith
Ch6. Deployment
Ch7. Testing
Ch8. Monitoring
Ch9. Security
Ch10. Conway’s Law & System Design
Ch11. Microservices at scale
Ch12. Bringing It All Together

I’ve learned something today as I’d never heard of “Conway’s Law”. Here’s a list of Google results

Conway states: Conway’s law was not intended as a joke or a Zen koan, but as a valid sociological observation. It is a consequence of the fact that two software modules A and B cannot interface correctly with each other unless the designer and implementer of A communicates with the designer and implementer of B. Thus the interface structure of a software system necessarily will show a congruence with the social structure of the organization that produced it.

Wiki: Conway’s law is an adage named after computer programmer Melvin Conway, who introduced the idea in 1968; it was first dubbed Conway’s law by participants at the 1968 National Symposium on Modular Programming.[1] It states that

organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations

—M. Conway[2]

A nice link, which took me to another nice link, which provides a nice, succint definition (same author as the book)

In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare mininum of centralized management of these services, which may be written in different programming languages and use different data storage technologies.

Which arrived me at another link

To start explaining the microservice style it’s useful to compare it to the monolithic style: a monolithic application built as a single unit. Enterprise Applications are often built in three main parts: a client-side user interface (consisting of HTML pages and javascript running in a browser on the user’s machine) a database (consisting of many tables inserted into a common, and usually relational, database management system), and a server-side application. The server-side application will handle HTTP requests, execute domain logic, retrieve and update data from the database, and select and populate HTML views to be sent to the browser. This server-side application is a monolith – a single logical executable[2]. Any changes to the system involve building and deploying a new version of the server-side application.

{Here endeth today’s lesson. More tomorrow}

ES: Filter types

and filter
bool filter
exists filter
geo bounding box filter
geo distance filter
geo distance range filter
geo polygon filter
geoshape filter
geohash cell filter
has child filter
has parent filter
ids filter
indices filter
limit filter
match all filter
missing filter
nested filter
not filter
or filter
prefix filter
query filter
range filter
regexp filter
script filter
term filter
terms filter
type filter

Chapter 12: Structured search

pp181.
Combining Filters

SELECT product
FROM products
WHERE (price 20 OR productID = ‘XXYYZZ’)
AND (price !=30)

Bool Filter

The bool filter is comprised 3 sections

{
“bool” : {
“must”:          [ ],
“should”:       [ ],
“must_not”:   [ ],
}
}

-MUST: All of the clauses must match. The equivalent of AND
-SHOULD: At least ONE of the clauses must match. The equivalent of OR
-MUST_NOT: All of the clauses must NOT match. The equivalent of NOT

_________________________________________________________________________

To replicate the preceding SQL example, we will take the two term filters that we used previously and place them inside the should clause of a bool filter, and add another clause to deal with the NOT condition:

GET /my_store/products/_search
{
   "query" : {
      "filtered" : { 
         "filter" : {
            "bool" : {
              "should" : [
                 { "term" : {"price" : 20}}, 
                 { "term" : {"productID" : "XHDK-A-1293-#fJ3"}} 
              ],
              "must_not" : {
                 "term" : {"price" : 30} 
              }
           }
         }
      }
   }
}
Note that we still need to use a filtered query to wrap everything.
These two term filters are children of the bool filter, and since they are placed inside the should clause, at least one of them needs to match.
If a product has a price of 30, it is automatically excluded because it matches a must_notclause.

Our search results return two hits, each document satisfying a different clause in the bool filter:

"hits" : [
    {
        "_id" :     "1",
        "_score" :  1.0,
        "_source" : {
          "price" :     10,
          "productID" : "XHDK-A-1293-#fJ3" 
        }
    },
    {
        "_id" :     "2",
        "_score" :  1.0,
        "_source" : {
          "price" :     20, 
          "productID" : "KDKE-B-9947-#kL5"
        }
    }
]
Matches the term filter for productID = "XHDK-A-1293-#fJ3"
Matches the term filter for price = 20

Chapter 8: (Sorting and) relevance

(pp117) What is Relevance?

The relevance score of each document is represented by a +ve float called _score. The higher the _score, the more relevant the document.

A query clause generates a _score for each document. How that score is calculated depends on the type of query used. 

Different queries are used for different purposes.
– A fuzzy query might determine the _score by calculating how similar the spelling of the query term is to that within documents.
– A terms query would incorporate the percentage of terms that were found.

The standard similarity algorithm used in ES is known as a term freqy/inverse document freqy or TD/IDFwhich takes the following factors into account:

-Term Freqy
-Inverse Document Freqy
-Field-length norm : How long is the field? The longer it is, the less likely it is that the words in the field will be relevant. A term appearing in a <short title> field carries more weight than the same term appearing in a long <content> field.

Example:

Consider a document containing 100 words wherein the word cat appears 3 times. The term frequency (i.e., tf) for cat is then (3 / 100) = 0.03. Now, assume we have 10 million documents and the word cat appears in one thousand of these. Then, the inverse document frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4. Thus, the Tf-idf weight is the product of these quantities: 0.03 * 4 = 0.12.
Source: http://www.tfidf.com/

Ch3: Data In, Data Out (inverted indexes)

In the real world, not all entities of the same type looks the same. One person might just have a home telephone, another a cell # and another both, another none of these. In the RDBMS world, each entities demands its own column, or to be modelled as a KV pair. This leads to waste and redundancy. Brazillian people may have 10+ family names, a Brit one.

The problem comes when we need to store these entities. Traditionally, this as been accomplished through a RDBMS with columns and rows.

Of course, we don’t only need to store the data, we need to query, use the data. While noSQL solutons (eg MongoDB) exist that allow us to store objects as documents, they still require us to think about how we want to query our data, and which fields require an index to speed up retrieval.

In ES, all data in every field is indexed by default. Every field has a dedicated inverted index and unlike most other DBs, it can use all of those inverted indices in the same query

What is an inverted index?

Document orientation

Objects are seldom is simple list of KV pairs!

More often than not they are complex data structures that contain dates, geo, objects, arrays of values etc.

sooner or later you’re going to want to store these objects (in a database), trying to do this with the rows and columns of a relational DB…means flattening the object to fit the schema – usually one field per column – and then have to reconstruct it every time it is retrieved 

Consider this JSON document:

{
“email”:”x@y.com”,
“first name”: “john”,
“last name”: “smith”,
“info”:  {
“bio”: “Some blurb……”,
“age”:41,
“interests”: [football, cricket]
},
“join_date”: “2015/15/2″
}

Installing ES

install/update latest version of www.java.com

http://www.elasticsearch.org/overview/elkdownloads/

Installing MARVEL, which comes with an interactive console ‘sense’, which makes it easy to talk to ES directly from a browser.

Marvel is available as a plugin

Turn off monitoring of your local cluster
echo ‘marvel.agent.enabled: false’ >> .config/elasticsearch.yml

Running ES
./bin/elasticsearch
./bin/elasticsearch -d to run in background as a daemon

Test it out
curl ‘http://localhost:9200/?pretty&#8217;

Book: ElasticSearch – The Definitive Guide

http://www.amazon.com/Elasticsearch-Definitive-Guide-Clinton-Gormley/dp/1449358543/ref=sr_1_fkmr0_1?ie=UTF8&qid=1424028696&sr=8-1-fkmr0&keywords=elastic+search+book+the+definitive+guide

“Unfortunately, most DBs are astonishingly inept at extracting actionable knowledge from your data. Sure, they can filter by timestamp or exact values, but can they perform full-text search, handle synonyms, and score documents by relevance? Can they generate analytics and aggregations from the same data? Most important, can they do this in real time without big batch-processing jobs?

ES encourages you to explore and utilize your data, rather than letting it rot in a warehouse because it is too difficult to query.

(pp3) ES is more than just Lucene, it can also be described as follows:

  • A distributed real-time document srore where every field is indexed and searchable
  • A distbuted search engine with real time analytics
  • Capable of scaling to hundreds of servers and petabytes of structured and unstructured data