Thursday, August 4, 2016

Data Lake vs Data Warehouse

Great article is published by Martin Fowler:http://martinfowler.com/bliki/DataLake.html

But to quickly summarize, think of data lakes as data dump of raw unprocessed data that people can go back to and get the information that they might need from the raw contents

In Data Warehouse, someone would define a schema and will do some data cleaning and processing. This way, people can just grab a set from the warehouse and expect it to be a certain way and don't worry about manual processing.

So... if you don't know what you need and you don't mind looking for it, go with the lake
if you rather have sets of data ready to use for you in predefined format and you don't care about some potential missing data, go with the warehouse.

No comments:

Post a Comment