Getting good text data for language-model training isn’t as easy as it sounds. First, you have to find a large corpus. Second, you must clean it up! 484 more words
Tags » ASCII
Long read about Unicode: You, Me And The Emoji: Character Sets, Encoding And Emoji – Smashing Magazine
A well worth long rad:
240 more words
We all recognize emoji. They’ve become the global pop stars of digital communication. But what are they, technically speaking? And what might we learn by taking a closer look at these images, characters, pictographs… whatever they are 🤔 (Thinking Face).