Some of my earliest childhood memories are of going to junk yards with my father. He worked his entire career on car and truck (the 18 wheeler kind) repair and would sometimes help out family and friends whenever they had a mechanical or electrical problem. As money was always tight, getting used parts from junkyards was always par for the course.

http://www.wrenchheads.net/tag/self-service-junkyard/

Over the years, I probably went with him to every junkyard in a 50 mile radius and remember looking at the seemingly endless rows of wrecked cars and marvel at their diversity that covered most of what I would see on the streets. This being Portugal in the late 80s, there would be a fair share of Renault R5, Opel Kadett and Toyota Celica, but some times I would get a chance to see a fancy Mercedes or BMW that for some twist of fate had ended up in particular neck of the woods.

I also found it endless fascinating how every car, regardless of model or origin, was damaged in a different way. Some were done in by head on collisions, others by rolling over, while yet some others looked intact from the outside indicating that they had probably been done in by some engine issue. As I was always a fan of puzzles, I remember thinking how one might use parts from various wrecks to build a whole “new” car: take wheels and axels from one, the frame from another, engine parts from a few more, add a generous amount of elbow grease and technical prowess and.. voila! your chariot awaits! I even day dreamed that I might build a car myself one day.

https://www.yahoo.com/news/bp/rare-ferrari-f50-turns-salvage-yard-bidding-battle-183329477.html

If you stayed with me this far, you’re probably wondering what, if anything, my reminiscing about cherished family memories has to do with “Big Data”. Hang in there, hopefully it will all make sense soon. While working on a project using a large amount of Twitter data to characterize human behavior, it occurred to me that the biases one encounters in the massive piles of data that are constantly accumulating are a bit like the piles of cars in those junkyards. Every user shows only a piece of himself by using these services, but by carefully analyzing large amounts of users one might get a fuller picture of human behavior (or as full as possible within the constraints imposed by each platform). In much the same say, each wreck has only a part of it that is salvageable but one can build a fully working car with pieces from many wrecks.

The view we get is also biased. In the area where I grew up, I would see mostly the lower range of the Portuguese carpool because that’s what was most common in that area. This was clearly not an accurate representation of the cars on the road at that point in time. In richer parts of the country you would likely see many more Porches, Audis or even Ferraris (for a while Portugal had the county with the highest number of Ferrari per capita outside of Italy). In the same way, different platforms will attract one specific kinds of user more than others. A good illustration of this is this recent graph by the Pew Internet Research center about the users of various Online Social Networks.

http://www.pewinternet.org/2015/08/19/the-demographics-of-social-media-users/

For example, 44% of online adult women in the US use Pinterest while only 21% use Twitter. On the other hand, 46% of college degree holders use LinkedIn versus only 27% for Twitter.

One direct implication of this is that we must be careful when comparing results across different platforms and even more so when trying to extrapolate from our system of choice to the “real world”. Of course, this shouldn’t be taken to imply that studies from Big Data in general, and Social Media in particular, are not valid or useless. Indeed, I and many others have built entire careers around these kind of studies, but it’s important to keep in mind the biases and limitations I mentioned above. Remembering that I’m working in a junkyard helps me be careful, hopefully you’ll find it useful as well.

Oh, and my dream of building a car, you ask? Alas my mechanical prowess never went that far. Thanks again to my father, my interests quickly moved to electronics before finally settling on Physics and Computer Science, but that’s a story for another time.

Data Science, Machine Learning, Human Behavior

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store