When LinkedIn was launched in 2003, it took them 500 days to reach their first million customers; the most recent million took just six days. Today there are two new registrations with the site every second, 4.2 billion user searches a year and the data analysis team looks at 200 TB of data each day to understand its users better.
Five years ago, inWhat is Web 2.0, Tim O’Reilly said that “data is the next Intel Inside.” Why do we suddenly care about statistics and about data? Why data science is the Sexiest Job of the 21st Century? This Lesson focus on getting some of the answers from Manu Sharma
LinkedIn is your professional digital real estate. When people look for you and don’t find you it’s like lose of a potential opportunity. Therefore it’s important to keep the profile up-to-date. LinkedIn uses data to build products and generate insights to drive the business. To achieve this LinkedIn have developed proprietary algorithms such as Metropolis. It process over 10 billion rows of data everyday in real time by building it’s own unique solutions like Voldemort, Kafka, Zoie. These have been made open source now.
Data Scientist is the right combination of curiosity and intuition; I wonder what can I do with this data? what questions can I ask? What can this data tell me? It’s about having the right intuition to know the limitations of your approaches. It involves gathering data, standardizing it, doing the right modeling, doing stacks on it and having the ability to code it. A data scientist needs all these skills and that’s what startups should look for when setting up their data science teams.
Key Application of Data Science @ LinkedIN
- Build Innovative Data Products
- Drew Insights
- Drive the Business
Inference Algorithms helps in predicting information based on users network data. This can be extremely critical in future product development. In particular it helped in building “People you may know” feature is key to it’s user engagement and viral growth. A feature invented at LinkedIn; now used in every social product.
Similarly LinkedIn build “Skills” by extracting and analyzing free form text written by users under specialties section and creating a standardized dictionary of skills key words. Which then can lead to a lot of interesting insights by applying Clustering Algorithms
The Data also leads to meaningful insights such as predicting future through identifying trends in sectors and economy. Good thing is that is not a survey data; it’s the real data that users provide through their activities. It’s not surprising that this data made a part of US presidential economic report as policy inputs. This same data is equally vital in driving business growth.
Best Practices
- More data is better than less data
- Raw data is better than processed data
- Data Standards and Data Quality are vital
- Simple Models are better than Complex Models
- Fail Fast, Iterate, test, test and Test
Loading



