Red head boar blogs: Data Mining, Statistics & Machine Learning

Microstrategy has integrated Data mining into its mainstream BI platform. Using this technology, users are just not limited to historical analysis but also have access to predictive analysis of their data, thereby empowering the decision makers to make more insightful decisions for their businesses.

Data Mining is extracting of useful information from data using any means such as statistical analysis, modeling techniques, machine learning or database technologies.
Statistics and Machine learning fields can be easily confused. Let's understand their history and they relate to each other.

Statistics is a branch of Mathematics whose main focus is the estimation of quantities given observations. To do so, it relies on probabilistic reasoning.

Probability theory attempts to put randomness in the framework of mathematical function. A mathematical function is fully characterized by certain fixed numbers which are parameters. However, although there is no randomness in the function, it can be used to model random behavior, for e.g. If you have random data, you can construct a histogram from it. Probability theory says that if you have a large data set of examples, the histogram will eventually become non random and fixed, just like a function whose output at any input can be predicted precisely.

Statistics assumes that observations are coming from random behavior generated by such a random function. However, this random function is unknown. And thus the aim of statistics is to estimate this random function.

Machine Learning is an outgrowth from the artificial intelligence community. The aim of this community was originally to make computers as intelligent or even surpass humans. However, after decades of disappointment and ad-hoc computer programs that amounted to nothing more than cool demos, the artificial intelligence community decided to lower their goals and split into multiple communities with much less ambitious charter and machine learning is one of them.

At the heart of machine learning is the classification problem. It is much less ambitious problem than matching the whole of human intelligence. Here, the aim is that given any kind of data, to be able to classify it into certain smaller number of classes. for e.g. teaching a computer to be able to label an image whether it is green or yellow. The way machine learning tackles this problem is by first converting this data into values or numbers, and then discovering a function that will take these numbers as input and give labels as outputs. To do so, it takes multiple examples of data and labels.

Since Statistics is also trying to construct functions from observations, many of their tools come handy for their Machine learning community and over time these two communities have come very close to each other.

Please feel free to ask questions in the comment box if you need any clarifications.

Red head boar blogs

Tuesday, October 29, 2013

Data Mining, Statistics & Machine Learning

No comments :

Post a Comment