Data Lake solution for a real estate company.
The company do outbound marketing to property owners to see if they want to rent, sell, or buy the property. Data Lake proposes to get a clear view of all interactions with each lead before the client agreed to talk with the firm after that client is pushed to the CRM for the sales team and join with all the data on their property.
The project had three data sources RDS instance, property data located on WEB in zip TSV files, and JSON data from Web CRM system.
Two layers of data were built on S3. AS_IS layer, where data is landing without changes from the sources and CURATED layer where data was formatted into the analytical format (parquet + snappy) and partitioned by date. Glue Catalog was chosen as a centralized metastore for all data layers. AWS Lake Formation was used to provide secure access to the data.