Friday, August 19, 2016

Top 5 books for a Data Scientist

1. Big Data: The Numbers Game Deciphered

This compact, informative guide to the world of Data Science will have you up-to-date in no time. What's in the book?
  1. Data Scientists -What do they do?
  2. Pre-requisites for becoming a Data Scientist
  3. Must-have skill-sets
  4. Study-Plan
  5. What the future holds
Click here to download Big Data ebook for free

2. Top Programming Languages for a Data Scientist

Experts foresee 10 million job openings in the world of Big Data! No wonder it's the coolest job of the 21. But to get ahead in the field, you need to ensure your skill set is up to date. Which is why we have prepared this eBook that gives you a list of top programming languages for a datascientist.
  • An overview of the top 10 programming languages for data scientists
  • The features of these programming languages
  • Their application in the industry

3. 8 Essential Concepts of Big Data and Hadoop


The centerpiece of the Big Data revolution, Hadoop is the most important technology in the Big Data family. Download this handy guide to learn all you need to know about Hadoop & its ecosystem.
Here's a nifty compendium of the most important terms and definitions from the Big Data universe.
Find inside write-ups on:


  1. MapReduce
  2. HDFS
  3. Pig vs. SQL
  4. HBase components
  5. Cloudera
  6. Zookeeper and Sqoop
  7. Hadoop Ecosystem



4. Secret to Unlocking Tableau's Hidden Potential

If you're looking for tips, useful hacks, & secret techniques to get the most out of Tableau, this eBook will teach you what you need to know.You will find out about its hidden functionalities and explore unused features that will make you a Tableau superstar.
This eBook will help you:
  1. Unleash Tableau's potential
  2. Discover hidden functionality
  3. Explore unused features
  4. Learn how to make the most of the tool


5. Top 25 Interview Questions and Answers: Big Data Analysis

You could be the most knowledgeable data professional in the world, but unless you make an impression in your job interview, it's unlikely you'll land your dream role.
Get a peek into the mind of the Data Science interviewer with this compilation of the top 25 Big Data interview questions and answers.
  1. Tips and advice on how to craft your replies to each question, for maximum impact
  2. Points to avoid in all your answers
  3. Insights into the interviewer's mind: what is the purpose of the question?

Thursday, August 18, 2016

Top 10 companies in Data Science



Actian, Redwood City, CA, founded 2005.

Actian powers the action-driven enterprise, delivering rapid time to analytic value. Actian Analytics Platform Outperforms All Others By 2X, Sets New Record In Latest TPC-H Benchmark

Birst, San Francisco, CA, founded 2004.
Birst’s unique approach harmonizes the needs of both IT and business users.
BloomReach, Mountain View, CA, founded 2009.

BloomReach powers the best commerce experiences. Get people to your products faster. Personalize your customer experience. Increase your revenue.
CBIG Consulting, Rosemont, Illinois, founded 2002.
CBIG’s team of consultants, analysts, and data architects work with enterprises like yours to build a roadmap to success with Business Intelligence and Big Data Analytics.

Cirro, San Juan Capistrano, CA, founded 2010.
Cirro solves the root problem of analytics by making all data silos in an enterprise equivalently accessible. The Cirro Universal Data Network platform unifies the data ecosystem by providing access, intelligent integration, and management of all enterprise data regardless of type, engine or location.

Digital Reasoning, Franklin, TN, founded 2000.
a team of passionate, creative and caring thinkers, building intelligent technology that makes a difference in the world.

Flutura Solutions, San Jose, CA, founded 2011.
Flutura Decision Sciences and Analytics is an IOT intelligence company that is powering new monetizable business models using machine signals in the Engineering and Energy Industry. The new business models we power impact operational, process and asset efficiency outcomes.

Fractal Analytics, San Mateo, CA, founded 2000.
"Best Company to Work" for 2016 by Great Place to Work Institute

Hadapt, Cambridge, MA, founded 2010.
Presto is an open source distributed SQL query engine designed for running interactive analytic queries against data sources of all sizes. Through a single query, Presto allows you to access data where it lives, including in Hadoop, Apache Cassandra™, relational databases such as MySQL and PostgreSQL or even proprietary data stores. Presto was created by Facebook for the analytics needs of data-driven organizations.

Link Analytics, Atlanta, Seattle, Knoxville, founded 2010.
Analytics enables a shift of focus from simply delivering data to establishing a business analytics approach for evolving to an intelligent enterprise.

Monday, August 15, 2016

9 Must Have Skills to be a Data Scientist

1. Programming


Python - Learning python is a fun. http://pythonprogramminglanguage.com/

IPython Notebook - http://ipython.org/notebook.html


2. Probability and Statistics 

The connection between Data Mining and Statistics is beautifully explained by Jerome H. Friedman. http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdf

R - A language for statistical computing and plotting. https://www.r-project.org/

SymPy - http://www.sympygamma.com

Statsmodels - http://statsmodels.sourceforge.net/

Statistics Interactive Course - http://onlinestatbook.com/index.html


3. Machine Learning Algorithms & Libraries


Scikit Learn - http://scikit-learn.org/stable/

Scipy - http://scipy.org/scipylib/index.html

Scikit-Image - A collection of algorithms for image processing in Python

PyBrain - http://pybrain.org


4. Data Structures


Csvkit - https://csvkit.readthedocs.org

Pandas - http://pandas.pydata.org

NumPy - http://www.numpy.org

Theano - http://deeplearning.net/software/theano/library/tensor/basic.html


5. Information Extraction


Scrapy - http://scrapy.org/

BeautifulSoap -https://pypi.python.org/pypi/beautifulsoup4


6. Natural Language Processing


NLTK - http://www.nltk.org/

Pattern- https://pypi.python.org/pypi/Pattern


7. Databases



Wide Column Store/Column Families

HBase
Cassandra
Hypertable

Key-Value/Tuple Store

Couchbase Server
Voldemort
Cassandra
MemcacheDB
Amazon DynamoDB

Document Store

MongoDB
CouchDB

Graph Database

InfoGrid

8. Big Data & Distributed Computing


Amazon's EC2 - https://aws.amazon.com/ec2/

Map Reduce - https://en.wikipedia.org/wiki/MapReduce

Apache Hadoop - A software for distributed computing. http://hadoop.apache.org/

Apache Mahout - http://mahout.apache.org/

Apache Spark - http://spark.apache.org/

Apache Whirr - https://whirr.apache.org/

Hive - https://hive.apache.org/

Zookeeper - https://zookeeper.apache.org/


9. Visualization


Matplotlib - http://matplotlib.org

Bokeh - http://bokeh.pydata.org/en/latest/

D3.js https://d3js.org/