Community Comment: Part 36 - Data modelers need to understand patterns * Data modelers need to understand patterns * Like programming, don't reinvent the wheel * Universal patterns & metapatterns exist * Remember "The Data Model Resource Book"
Community Comment: Part 34 - Popular data engines use Java / the JVM because it ruled the enterprise for many years * Reasoning behind popular data engines & Java * Java / the JVM long ruled the enterprise * Spark, Flink, Presto & Trino are all Java-based * Remember: Presto & Trino share some history
Community Post: Part 5 - Databricks is the 2023 runner-up winner following acceptance of my proposal to include it in DB-Engines ranking * DB-Engines ranking of DBMS products * Databricks climbs after my proposal to include * Databricks is runner-up winner for 2023 * Earlier rankings left out Databricks
Community Comment: Part 33 - With respect to Databricks vs Snowflake, actual usage is more interesting than account creation * Snowflake vs Databricks account penetration * Account growth rates not divulged * Data can be easily misinterpreted * Databricks: AI advantage, closing in elsewhere
Community Comment: Part 32 - Citizen users of SQL need multiple skills * Citizen users of SQL need multiple skills * Understand what tables & views needed * Generate correct output * Performantly generate correct output
Community Comment: Part 31 - Wholesale self-service analytics is a dumb goal * Dumb goal: wholesale self-service analytics * Don't attempt a shotgun approach * Self-service users need to be defined * Identify & cater to power users
Product Reviews: Part 12 - Update with a note about data quality (be careful when using floating data types!) Almost exactly a year has passed since publishing my last post in this series, and a lot has changed. So many changes have taken place, in fact, that I've decided to significantly decrease my contributions as an Amazon product reviewer, beginning at the conclusion of my second "
Community Post: Part 4 - A book on deepfake detection quoted me A well-received community post I recently made: https://www.linkedin.com/posts/erikgfesser_spark-bigdata-analytics-activity-7130307566965776384-ZYnv I'm apparently being quoted in books now. The quote is in chapter 3 ("Deepfakes Spark Implementation for BigData Analytics") of a book entitled "Handbook of Research on Advanced Practical Approaches to
Community Post: Part 3 - Delta Lake 3.0 is like an electric car charging standard * The power of open source * Delta Lake 3.0 is just one example * Parquet-based data lakehouse file format * Similar to electric car charging standard
Community Comment: Part 30 - With respect to Azure, Databricks Unity Catalog & Microsoft Purview is not an either-or proposition * Databricks Unity Catalog & Microsoft Purview * Not an either-or proposition for Azure * Likely need to integrate these products * Integration dependant on several factors