Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Given a real world problem, algorithms or the mathematics behind it do not provide insight on their own. The only value knowledge of the algorithms provide is a general understanding of the ways in which you can approach a problem. If you can do the mathematics, you can look at how the way you structure your question influences the results you are seeing.

The process of exploratory data analysis is best done in (imho) the style of a Lakatos research programme.

- prepare and clean the data

- explore using many fast methods and charts, come up with some working hypotheses about the data that are important to your client

- select method(s) to test those hypotheses

- perform the analyses

- determine what your results mean in terms of your research goal.

- alter your list of working hypotheses

- repeat [possibly collecting more data]

Obviously the only hard part about this is step five. And unfortunately this is the step that isn't t really taught in my experience. A simple case: Let's say you had a linear regression and you ran it once with 2 variables, got some parameter estimates (a,b) , and ran it again with 3 parameters and got some more parameter estimates (a1,b1,c). If b != b1, what does this mean? If you are using a custom link function (e.g. cloglog or logit), how should you interpret this now? This is where having a deep understanding of the mathematics behind the techniques starts to pay off. And this is the simplest example of a basic regression.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: