Player FM ऐप के साथ ऑफ़लाइन जाएं!
Fixing Dirty Data - How to clean Data by classification and Normalization | Susan Walsh
Manage episode 442876197 series 3556338
In the first ever English Episode of UNF#CK YOUR DATA host Christian Krug interviews Susan Walsh, the classification guru, on how to clean your dirty data.
But firstly, what is dirty data and why does this pose a problem?
Data in your company systems, like CRM or ERP, can have all sorts of issues. Duplicates, near duplicates, formats and so on.
So the records which should match, don’t. Or your numbers are off.
Basically, you can’t rely on the data in the system to make decisions. Like sending a mail or a leaflet. Potentially even an invoice. Or know who your real number one customer is.
To help you deal with this mess, Susan has created a framework, which helps you cleaning up your data. You have to normalize and classify your data. First agree on a common format an fit the data to it. Afterwards you can give the data a meaning by classifying it.
So you can further process the data and base your decisions on it.
Sad news for all the AI enthusiasts out there: This still requires an awful lot of human knowledge. No speeding up the process.
On the other hand this step is crucial for your AI success. As only good quality training data will lead to great AI results. Regardless, which use case you tackle first.
But cleaning data one is not a lasting solution. It’s a continuous effort and it hast to start at the very source where people enter the data into the systems.
So data quality is a process and mantra.
Find in this episode:
- Why data sometimes is so dirty
- How a COAT method can help you clean data
- Why data quality is not an AI topic
- Susans plans on a new framework
▬▬▬▬▬▬ Profiles: ▬▬▬▬
Zum LinkedIn-Profil von Susan: https://www.linkedin.com/in/susanewalsh/
Christian at LinkedIn: https://www.linkedin.com/in/christian-krug/
Unf*ck Your Data at Linkedin: https://www.linkedin.com/company/unfck-your-data
▬▬▬▬▬▬ Book recommendation: ▬▬▬▬
Susans book recommendation: Buy back your time - Dan Martell
The “UYD” bookshelf at Melena’s store: https://gunzenhausen.buchhandlung.de/unfuckyourdata
▬▬▬▬▬▬ Where to find UN#CK YOUR DATA: ▬▬▬▬
Podcast at Spotify: https://open.spotify.com/show/6Ow7ySMbgnir27etMYkpxT?si=dc0fd2b3c6454bfa
Podcast at iTunes: https://podcasts.apple.com/de/podcast/unf-ck-your-data/id1673832019
Podcast at Deezer: https://deezer.page.link/FnT5kRSjf2k54iib6
▬▬▬▬▬▬ Contact: ▬▬▬▬
E-Mail: christian@uyd-podcast.com
▬▬▬▬▬▬ Timestamps: ▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction and Welcome
01:13 Susan's Background and Expertise
03:03 Types of Dirty Data
04:01 The Impact of Dirty Data
06:12 Cleaning Data and the Role of Excel
07:34 The Limitations of AI in Data Cleaning
09:26 Automating Supplier Name Normalization
11:03 Data Classification and Context
13:52 The Importance of Business Understanding
16:26 The Role of Human Expertise in Data Work
19:32 Data Normalization and Classification
22:33 The Importance of Clean and Organized Data
27:19 The 'Data Coat' Methodology
31:26 The Value of Humor in Business
33:53 Book Recommendation: 'Buy Back Your Time'
105 एपिसोडस
Manage episode 442876197 series 3556338
In the first ever English Episode of UNF#CK YOUR DATA host Christian Krug interviews Susan Walsh, the classification guru, on how to clean your dirty data.
But firstly, what is dirty data and why does this pose a problem?
Data in your company systems, like CRM or ERP, can have all sorts of issues. Duplicates, near duplicates, formats and so on.
So the records which should match, don’t. Or your numbers are off.
Basically, you can’t rely on the data in the system to make decisions. Like sending a mail or a leaflet. Potentially even an invoice. Or know who your real number one customer is.
To help you deal with this mess, Susan has created a framework, which helps you cleaning up your data. You have to normalize and classify your data. First agree on a common format an fit the data to it. Afterwards you can give the data a meaning by classifying it.
So you can further process the data and base your decisions on it.
Sad news for all the AI enthusiasts out there: This still requires an awful lot of human knowledge. No speeding up the process.
On the other hand this step is crucial for your AI success. As only good quality training data will lead to great AI results. Regardless, which use case you tackle first.
But cleaning data one is not a lasting solution. It’s a continuous effort and it hast to start at the very source where people enter the data into the systems.
So data quality is a process and mantra.
Find in this episode:
- Why data sometimes is so dirty
- How a COAT method can help you clean data
- Why data quality is not an AI topic
- Susans plans on a new framework
▬▬▬▬▬▬ Profiles: ▬▬▬▬
Zum LinkedIn-Profil von Susan: https://www.linkedin.com/in/susanewalsh/
Christian at LinkedIn: https://www.linkedin.com/in/christian-krug/
Unf*ck Your Data at Linkedin: https://www.linkedin.com/company/unfck-your-data
▬▬▬▬▬▬ Book recommendation: ▬▬▬▬
Susans book recommendation: Buy back your time - Dan Martell
The “UYD” bookshelf at Melena’s store: https://gunzenhausen.buchhandlung.de/unfuckyourdata
▬▬▬▬▬▬ Where to find UN#CK YOUR DATA: ▬▬▬▬
Podcast at Spotify: https://open.spotify.com/show/6Ow7ySMbgnir27etMYkpxT?si=dc0fd2b3c6454bfa
Podcast at iTunes: https://podcasts.apple.com/de/podcast/unf-ck-your-data/id1673832019
Podcast at Deezer: https://deezer.page.link/FnT5kRSjf2k54iib6
▬▬▬▬▬▬ Contact: ▬▬▬▬
E-Mail: christian@uyd-podcast.com
▬▬▬▬▬▬ Timestamps: ▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction and Welcome
01:13 Susan's Background and Expertise
03:03 Types of Dirty Data
04:01 The Impact of Dirty Data
06:12 Cleaning Data and the Role of Excel
07:34 The Limitations of AI in Data Cleaning
09:26 Automating Supplier Name Normalization
11:03 Data Classification and Context
13:52 The Importance of Business Understanding
16:26 The Role of Human Expertise in Data Work
19:32 Data Normalization and Classification
22:33 The Importance of Clean and Organized Data
27:19 The 'Data Coat' Methodology
31:26 The Value of Humor in Business
33:53 Book Recommendation: 'Buy Back Your Time'
105 एपिसोडस
सभी एपिसोड
×1 Die digitale Schiene - Wie Daten und KI die Eisenbahn transformieren | Melanie Kleinpötzl 40:55
1 Alles über „UNF#CK YOUR DATA“ - Daten | KI | Podcasts | Euer Host | Geniale Gäste | Christian Krug 1:16:55
1 Die Kunst des Data-Pitch: Kurz und prägnant - wie ein TikTok Video | Asmaa Hechenberger 57:27
1 Der EU AI ACT - Aus Sicht des Engineering: Es ist eigentlich nur Doku | Larysa Visengeriyeva 1:01:27
1 Das Marketing, die Daten und der ganze Rest | Stephanie Verch & Viviane Wilde-Skibicki 1:07:35
1 Datenkultur oder Datenstrategie - Wer frisst wen? Oder wo anfangen? | Marco Geuer 54:23
1 AI ohne Bias? Warum ausgewogenen und saubere Daten so wichtig sind | Elisabeth l‘Orange 59:09
1 Data-driven zur Essstörung - Wenn der Umgang mit Daten ungesund wird | Medea Lorenzen 51:01
1 Data Consulting – Stiftet Beratung Mehrwert? Freelancer oder Agentur? | Tim Wiegels 52:03
1 Xing ist tot? LinkedIn die Zukunft? Was sagen die Daten einer Headhunterin?| Ika C. Amonath 56:20
1 ROAS gets roasted - Wie Performance Marketing schlechte Werbung macht | Philipp Loringhoven 44:40
1 Mit Datenqualität gewinnen - Von Data Contracts bis Data Democracy | Steffi Kostorz 56:22
1 Digitalisierung und Daten in der Sicherheitstechnik - Data-driven und Innovativ | Merle Sandersfeld-Kelm 35:16
1 Shaping Your Data Career - Which skills to learn and how to develop | Eva Murray 53:45
1 Conversational AI, Virtuelle Assistenz - Gute Chatbots - Keine Warteschlangen | Sarah Rojewski 1:00:23
प्लेयर एफएम में आपका स्वागत है!
प्लेयर एफएम वेब को स्कैन कर रहा है उच्च गुणवत्ता वाले पॉडकास्ट आप के आनंद लेंने के लिए अभी। यह सबसे अच्छा पॉडकास्ट एप्प है और यह Android, iPhone और वेब पर काम करता है। उपकरणों में सदस्यता को सिंक करने के लिए साइनअप करें।