usecases of spark streaming in telecom industry drools - rule based engine . biggest graphist is facebook - social networking data is in form of graph. collaborative filtering - Recommendation by ecommerce mahout is a project which takes all ML algo and converts to map reduce.
file formats sequence - key value pair avro- schema based used in flume best for RPC calls
hdfs dfs -cat /loudacre/kb/KBDOC-00289.html | head \
-n 20 refer http://free.primarypad.com/p/devsh_Jan10_bt_ss for extra blogs and notes - accessible in wifi
IMP -Partitioning in Spark Jobs
The number of partitions depends on 1) size of input data and 2) explict partion ing number . Task stages remains same , number of tasks may differ based on any language. python lib for serialization - pickle kryo - java + scala
flume - serializers kafka channel is highly reliable - 0 loss of events
Caused by: py4j.Py4JException: Cannot obtain a new communication channel
Please You Class need ID: Key: goyour toxUh5pHew 6297 training.cloudera.com username and password and probably complete the following survey info.