Misc news about the gedit text editor, mid-August edition! (Some sections are a bit technical).
Code Comment plugin rewritten
I forgot to talk about it in the mid-July news, but the Code Comment plugin has been rewritten in C (it was previously implemented in Python) and the bulk of it is implemented as re-usable code in libgedit-tepl. The implementation is now shared between Enter TeX and gedit.
File loading and saving: a new GtkSourceEncoding class
I've modified the GtkSourceEncoding class that is part of libgedit-gtksourceview, and adapted gedit accordingly. The new version of GtkSourceEncoding comes from an experiment that I did in libgedit-tepl several years ago.
GtkSourceEncoding represents a character set (or "charset" for short). It is
used in combination with iconv
to convert text files from one
encoding to another (for example from ISO-8859-15 to UTF-8).
The purpose of the experiment that was done in libgedit-tepl (the TeplEncoding class) was to accomodate the needs for a uchardet usage (note that uchardet is not yet used by gedit, but it would be useful). uchardet is a library to automatically detect the encoding of some input text. It returns an iconv-compatible charset, as a string.
It is this string - returned by uchardet - that we want to store and
pass to iconv
unmodified, to not lose information.
The problem with the old version of GtkSourceEncoding: there was a fixed set
of GtkSourceEncoding instances, all const
(so without the need to
free them). When trying to get an instance for an unknown charset string, NULL
was returned. So this was not appropriate for a uchardet usage (or at least,
not a clean solution: with the charset string returned by uchardet it was not
guaranteed that a corresponding GtkSourceEncoding instance was available).
Since GtkSourceEncoding is used in a lot of places, we don't want to change the code to represent a charset as just a string. And a simple string is anyway too basic, GtkSourceEncoding provides useful features.
So, long story short: the new GtkSourceEncoding class returns new instances
that must be freed, and has a constructor that just makes a copy of the
charset string (there is the get_charset()
method to get back the
string, unmodified).
So gedit can keep using the GtkSourceEncoding abstraction, and we are one step closer to being able to use uchardet or something similar!
Know more about the gedit's maintainer
I now have a personal web site, or more accurately a single web
page:
wilmet-software.be
(Sébastien Wilmet)
gedit is a 27-years-old project, the first lines were written in 1998 (and normally it won't be part of the 27 Club!). I've been a contributor to the project for 14 years, so more than half the project existence. Time flies!
Robust file loading - some progress
After the rework of GtkSourceEncoding (which is part of the File Loading and Saving subsystem in libgedit-gtksourceview), I've made good progress to make the file loading more robust - although there is more work still to do.
It is a basis in programming to check all program input. gedit makes things a bit harder to accomplish this. To open a document:
- There is first the problem of the character encoding. It is not sufficient for a general-purpose text editor to accept only UTF-8. So text files can be almost anything in binary form.
- Then gedit allows to open documents containing invalid characters in the specified or auto-detected encoding. With this, documents can really be anything in binary form.
- Finally the GtkTextView widget used at the heart of gedit has several limitations: (1) very big files (like log files or database dumps) are not supported, a limit on the content size must be set (and if reached, still allow to load the file with truncated content). (2) very long lines cause performance problems and can freeze the application.
So, the good news is that progress has been made on this. (There is only a good news, let's stay positive!).
If you appreciate the work that I do in gedit, I would like to know your feedback, what I should improve or which important feature is missing. You can contact me by email for example, or on a discussion channel. Thank you :-) !