Return to Text editor, Editing, Edit-compile-debug, Data Science, Python Data Science, DataOps, Data Cleaning, Python ML - Python DL - Python NLP - Python MLOps, Data Science bibliography, Data Science glossary, Awesome Data Science, Data Science topics
For Big data editing, besides Python data cleaning, I recommend:
Cloud Monk's Review of the Buggy and Way Overly Complicated Text Editor to AVOID called EmEditor.
SO BUGGY!!! I no longer recommend this product due to numerous keyboard shortcut bugs that the author refuses to fix even after I spend 5 hours documenting them in several emails. His English is very poor so he doesn’t understand what I say. And then asks me to re-explain it differently. Ugh!
It is fine for mouse only use, but if you use only the keyboard and the standard Windows editing keyboard shortcuts, you will be very frustrated.
Yutaka Emura is creator of this very overly complicated text editor. I highly recommend to AVOiD it if you use keyboard shortcuts instead of constantly mousing.
Notepad Plus Plus is FAR superior.
https://stackoverflow.com/questions/159521/text-editor-to-open-big-giant-huge-large-text-files
The author is horrible at creating bugs, fixing them and then reintroducing the same bugs again over several years. This is developer Yutaka Emura.
OLD REVIEW:
Data cleansing or data cleaning is the process of identifying and correcting (or removing) corrupt, inaccurate, or irrelevant records from a dataset, table, or database. It involves detecting incomplete, incorrect, or inaccurate parts of the data and then replacing, modifying, or deleting the affected data. Data cleansing can be performed interactively using data wrangling tools, or through batch processing often via scripts or a data quality firewall.
After cleansing, a data set should be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores. Data cleaning differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at the time of entry, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code), or with fuzzy or approximate string matching (such as correcting records that partially match existing, known records). Some data cleansing solutions will clean data by cross-checking with a validated data set. A common data cleansing practice is data enhancement, where data is made more complete by adding related information. For example, appending addresses with any phone numbers related to that address. Data cleansing may also involve harmonization (or normalization) of data, which is the process of bringing together data of "varying file formats, naming conventions, and columns", and transforming it into one cohesive data set; a simple example is the expansion of abbreviations ("st, rd, etc." to "street, road, etcetera").
IDEs: Cloud Monk's Development PC DevOps Automation via Ansible-Chocolatey-PowerShell-Homebrew-DNF-APT, Development Tools, Cloud IDEs, Visual Studio, VSCode, VsCode Extensions, EmEditor for Big Data, JetBrains and JetBrains IDEs (JetBrains Plugins, AppCode, CLion, DataGrip, DataSpell, GoLand, IntelliJ, Android Studio, PhpStorm, PyCharm, Rider, RubyMine, WebStorm), Blocks, CodeLite, Eclipse, Eclipse Che, NetBeans, RStudio, Xcode, Apple Xcode, Jupyter Notebooks. Text editor, Source-code editor: Emacs, Vim, Vi, GNU nano, Atom Editor, Sublime Text, Brackets, jEdit, LaTeX, Notepad++, Windows Notepad, Edlin, Comparison of text editors, List of text editors, Editor war. (navbar_ide - see also navbar_jetbrains, navbar_vscode)
© 1994 - 2024 Cloud Monk Losang Jinpa or Fair Use. Disclaimers
SYI LU SENG E MU CHYWE YE. NAN. WEI LA YE. WEI LA YE. SA WA HE.