Table of Contents
Data cleansing
Return to Data cleaning, Data Science, Python Data Science, DataOps, Data Cleaning, Python ML - Python DL - Python NLP - Python MLOps, Data Science bibliography, Data Science glossary, Awesome Data Science, Data Science topics
For Data cleansing, besides Python data cleansing, I recommend avoiding buggy EmEditor.
- Snippet from Wikipedia: Data cleansing
Data cleansing or data cleaning is the process of identifying and correcting (or removing) corrupt, inaccurate, or irrelevant records from a dataset, table, or database. It involves detecting incomplete, incorrect, or inaccurate parts of the data and then replacing, modifying, or deleting the affected data. Data cleansing can be performed interactively using data wrangling tools, or through batch processing often via scripts or a data quality firewall.
After cleansing, a data set should be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleaning differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at the time of entry, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code), or with fuzzy or approximate string matching (such as correcting records that partially match existing, known records). Some data cleansing solutions will clean data by cross-checking with a validated data set. A common data cleansing practice is data enhancement, where data is made more complete by adding related information. For example, appending addresses with any phone numbers related to that address. Data cleansing may also involve harmonization (or normalization) of data, which is the process of bringing together data of "varying file formats, naming conventions, and columns", and transforming it into one cohesive data set; a simple example is the expansion of abbreviations ("st, rd, etc." to "street, road, etcetera").
Research More
Fair Use Sources
Python: Python Variables, Python Data Types, Python Control Structures, Python Loops, Python Functions, Python Modules, Python Packages, Python File Handling, Python Errors and Exceptions, Python Classes and Objects, Python Inheritance, Python Polymorphism, Python Encapsulation, Python Abstraction, Python Lists, Python Dictionaries, Python Tuples, Python Sets, Python String Manipulation, Python Regular Expressions, Python Comprehensions, Python Lambda Functions, Python Map, Filter, and Reduce, Python Decorators, Python Generators, Python Context Managers, Python Concurrency with Threads, Python Asynchronous Programming, Python Multiprocessing, Python Networking, Python Database Interaction, Python Debugging, Python Testing and Unit Testing, Python Virtual Environments, Python Package Management, Python Data Analysis, Python Data Visualization, Python Web Scraping, Python Web Development with Flask/Django, Python API Interaction, Python GUI Programming, Python Game Development, Python Security and Cryptography, Python Blockchain Programming, Python Machine Learning, Python Deep Learning, Python Natural Language Processing, Python Computer Vision, Python Robotics, Python Scientific Computing, Python Data Engineering, Python Cloud Computing, Python DevOps Tools, Python Performance Optimization, Python Design Patterns, Python Type Hints, Python Version Control with Git, Python Documentation, Python Internationalization and Localization, Python Accessibility, Python Configurations and Environments, Python Continuous Integration/Continuous Deployment, Python Algorithm Design, Python Problem Solving, Python Code Readability, Python Software Architecture, Python Refactoring, Python Integration with Other Languages, Python Microservices Architecture, Python Serverless Computing, Python Big Data Analysis, Python Internet of Things (IoT), Python Geospatial Analysis, Python Quantum Computing, Python Bioinformatics, Python Ethical Hacking, Python Artificial Intelligence, Python Augmented Reality and Virtual Reality, Python Blockchain Applications, Python Chatbots, Python Voice Assistants, Python Edge Computing, Python Graph Algorithms, Python Social Network Analysis, Python Time Series Analysis, Python Image Processing, Python Audio Processing, Python Video Processing, Python 3D Programming, Python Parallel Computing, Python Event-Driven Programming, Python Reactive Programming.
Variables, Data Types, Control Structures, Loops, Functions, Modules, Packages, File Handling, Errors and Exceptions, Classes and Objects, Inheritance, Polymorphism, Encapsulation, Abstraction, Lists, Dictionaries, Tuples, Sets, String Manipulation, Regular Expressions, Comprehensions, Lambda Functions, Map, Filter, and Reduce, Decorators, Generators, Context Managers, Concurrency with Threads, Asynchronous Programming, Multiprocessing, Networking, Database Interaction, Debugging, Testing and Unit Testing, Virtual Environments, Package Management, Data Analysis, Data Visualization, Web Scraping, Web Development with Flask/Django, API Interaction, GUI Programming, Game Development, Security and Cryptography, Blockchain Programming, Machine Learning, Deep Learning, Natural Language Processing, Computer Vision, Robotics, Scientific Computing, Data Engineering, Cloud Computing, DevOps Tools, Performance Optimization, Design Patterns, Type Hints, Version Control with Git, Documentation, Internationalization and Localization, Accessibility, Configurations and Environments, Continuous Integration/Continuous Deployment, Algorithm Design, Problem Solving, Code Readability, Software Architecture, Refactoring, Integration with Other Languages, Microservices Architecture, Serverless Computing, Big Data Analysis, Internet of Things (IoT), Geospatial Analysis, Quantum Computing, Bioinformatics, Ethical Hacking, Artificial Intelligence, Augmented Reality and Virtual Reality, Blockchain Applications, Chatbots, Voice Assistants, Edge Computing, Graph Algorithms, Social Network Analysis, Time Series Analysis, Image Processing, Audio Processing, Video Processing, 3D Programming, Parallel Computing, Event-Driven Programming, Reactive Programming.
Python Glossary, Python Fundamentals, Python Inventor: Python Language Designer: Guido van Rossum on 20 February 1991; PEPs, Python Scripting, Python Keywords, Python Built-In Data Types, Python Data Structures - Python Algorithms, Python Syntax, Python OOP - Python Design Patterns, Python Module Index, pymotw.com, Python Package Manager (pip-PyPI), Python Virtualization (Conda, Miniconda, Virtualenv, Pipenv, Poetry), Python Interpreter, CPython, Python REPL, Python IDEs (PyCharm, Jupyter Notebook), Python Development Tools, Python Linter, Pythonista-Python User, Python Uses, List of Python Software, Python Popularity, Python Compiler, Python Transpiler, Python DevOps - Python SRE, Python Data Science - Python DataOps, Python Machine Learning, Python Deep Learning, Functional Python, Python Concurrency - Python GIL - Python Async (Asyncio), Python Standard Library, Python Testing (Pytest), Python Libraries (Flask), Python Frameworks (Django), Python History, Python Bibliography, Manning Python Series, Python Official Glossary - Python Glossary, Python Topics, Python Courses, Python Research, Python GitHub, Written in Python, Python Awesome List, Python Versions. (navbar_python - see also navbar_python_libaries, navbar_python_standard_library, navbar_python_virtual_environments, navbar_numpy, navbar_datascience)
© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.