1. Programming
Python - Learning python is a fun. http://pythonprogramminglanguage.com/
IPython Notebook - http://ipython.org/notebook.html
2. Probability and Statistics
The connection between Data Mining and Statistics is beautifully explained by Jerome H. Friedman. http://statweb.stanford.edu/~jhf/ftp/dm-stat.pdfR - A language for statistical computing and plotting. https://www.r-project.org/
SymPy - http://www.sympygamma.com
Statsmodels - http://statsmodels.sourceforge.net/
Statistics Interactive Course - http://onlinestatbook.com/index.html
3. Machine Learning Algorithms & Libraries
Scikit Learn - http://scikit-learn.org/stable/
Scipy - http://scipy.org/scipylib/index.html
Scikit-Image - A collection of algorithms for image processing in Python
PyBrain - http://pybrain.org
4. Data Structures
Csvkit - https://csvkit.readthedocs.org
Pandas - http://pandas.pydata.org
NumPy - http://www.numpy.org
Theano - http://deeplearning.net/software/theano/library/tensor/basic.html
5. Information Extraction
Scrapy - http://scrapy.org/
BeautifulSoap -https://pypi.python.org/pypi/beautifulsoup4
6. Natural Language Processing
NLTK - http://www.nltk.org/
Pattern- https://pypi.python.org/pypi/Pattern
7. Databases
Wide Column Store/Column Families
HBaseCassandra
Hypertable
Key-Value/Tuple Store
Couchbase ServerVoldemort
Cassandra
MemcacheDB
Amazon DynamoDB
Document Store
MongoDBCouchDB
Graph Database
InfoGrid8. Big Data & Distributed Computing
Amazon's EC2 - https://aws.amazon.com/ec2/
Map Reduce - https://en.wikipedia.org/wiki/MapReduce
Apache Hadoop - A software for distributed computing. http://hadoop.apache.org/
Apache Mahout - http://mahout.apache.org/
Apache Spark - http://spark.apache.org/
Apache Whirr - https://whirr.apache.org/
Hive - https://hive.apache.org/
Zookeeper - https://zookeeper.apache.org/
9. Visualization
Matplotlib - http://matplotlib.org
Bokeh - http://bokeh.pydata.org/en/latest/
D3.js https://d3js.org/

 
No comments:
Post a Comment