Community Comment: Part 23 - Architecture diagrams can often mislead * Architecture diagrams can often mislead * Databricks isn't a storage solution * Databricks is a Hadoop successor * Both ETL and ELT involve input & ouput
Community Comment: Part 21 - Premature optimization is root of all evil, with caveats * Premature optimization is root of all evil, with caveats * This maxim often taken out of context * Functional correctness is first priority * Expected workloads impact architecture
Media Query Source: Part 40 - CIO (US digital magazine); 10 key roles for AI success * CIO (US digital magazine) * 10 key roles every AI team needs * Data engineer role is foundational * Suitable data leads to trustworthiness My responses ended up being included in an article at CIO (June 7, 2022). Extent of verbatim quote highlighted in orange, paraphrased quote highlighted in gray. Above image from
Community Comment: Part 19 - Firms have more choices than just buy vs build * Firms have more choices than vendors admit * The "buy vs build" choice is nuanced * Commercial & open source fall on spectrum * Blanket $ statements cannot be made
Community Post: Part 1 - My proposal to DB-Engines for inclusion of Databricks was successful! * My open letter to DB-Engines * Gartner's "DBMS market share ranks" post * Databricks left out of DB-Engines ranking * My proposal to DB-Engines was successful
Community Comment: Part 17 - Implementing solutions is more important than estimating build effort * Implementing > Estimating * Fine-grained software effort confidence level estimates meaningless * As with story point estimates, categories better * Breaking down problems decreases novelty, increases predictability
Community Comment: Part 14 - Vendor data products like Snowflake, Redshift & BigQuery are infrastructure, not data architecture The comments I provided in reaction to a community discussion thread. Data Warehouse Architect & Developer at Healthcare Data Consultancy: I keep seeing this again and again: databases like Snowflake, Redshift, or BigQuery that are called 'Data Warehouses' when discussed in the context of acquiring them. They are
Media Query Source: Part 37 - The Data Chief (ThoughtSpot podcast); My biggest data priority this year (2022) Note: unfortunately for the first time, I withdrew my response from consideration after discussing further with the Deloitte communications team. As I explained to folks at The Data Chief, while my response was generic, some uncertainty had subsequently been introduced as to whether the details I shared were permissible, and
Media Query Source: Part 35 - CMSWire (US digital magazine); Synthetic data & its usage in the workplace The responses I provided to a media outlet on February 9, 2022: Media: What is Synthetic data and does it have a digital workplace use? Gfesser: Synthetic data is a type of test data that is intended to reflect the statistical properties of real production data. Because of this, synthetic
Community Comment: Part 12 - Now that all companies are software companies, people who like data and use code are more common The comments I provided in reaction to a community discussion thread. Chief Consulting Officer at Investment Management Firm: I've come to the conclusion that the working world is roughly split into two groups… 1. People who like pictures and use powerpoint 2. People who like numbers and use