“First things first”. Tautological, always true. However, sometimes some data scientists seem to ignore this: you can think of using the most sophisticated and trendy algorithm, come up with brilliant ideas, imagine the most creative visualizations but, if you do not know how to get the data and handle it in the exact way you need it, all of this becomes worthless. 2,444 more words

Function of the Week: `count`

I came across `count`

last week while doing some research about logistic regressions at work. It was a joyous day to find this function because it is equivalent to something I’ve done numerous times in the past: … 129 more words

#### plyr : Handy tools for split-apply-combine

Back to my football (ok – soccer – if you are from * that *part of the world) theme. Let’s suppose you have data on the performance of 5 players in 4 matches. 1,103 more words

#### Shading between two lines - ggplot

First one to say geom_ribbon loses. I was plotting some data for a colleague, had two lines (repeated experiment) per person (time on the x axis) facetted by id, I thought it’d be nice to shade the area between the two lines so that when they were deviating you’d see a large shaded area, and when they were close there would be little shading, just to aid the visual of the separation between repeats. 721 more words

#### The split-apply-combine strategy for R

Say you need to **split** up a big data structure into homogeneous pieces, **apply** a function to each piece and then **combine** all the results back together.

#### Mixed-effects modeling — four hour workshop — part III: Regression modeling

We completed a study on the factors that influence visual word recognition performance in lexical decision. We then put together a dataset for analysis here… 4,674 more words