The power of text in SW engineering
It took me a while to notice that using text in software engineering is very powerful. When I say using text, I mean using text in all kinds problems, challenges and domains involving software, not just a specific domain. Being introduced to some software problem, seeking for a textual solution may prove to be elegant, easy to maintain and even beautiful. Textual solutions may also be time and memory consuming, so they won’t fit everywhere, but it is always worth seeking for them, because where the fit, they shine.
It wasn’t before the third time I used a textual solution, that I actually noticed what I was doing, that is, using powerful text solutions instead of other more verbose and complicated solutions. Realizing this, I thought it was a good idea to share the knowledge. So here are the three software problems I faced and their textual solutions. I hope you’ll like them as much as I did.
One of the teams in the group I worked in was working on an embedded application, while my team was working on some big PC application. The two applications shared a very complicated algorithm, that required loading a few configuration files with parameters and data, and operated according to them. The data originated in readable text files, created by a different group. As you probably know, loading of persistent data is quite a different task between a PC and an embedded system. While PC applications can exploit the easy-to-use file system API together with standard protocols for parsing the data in those files (like SQL, XML or Windows INI files), embedded applications, which many times lack a file system or any other mechanism that goes along with dynamic memory allocations, often load the data from FLASH memory as raw bulks of numbers and incorporate some primitive mechanism to parse the data.
The embedded team used a tool, built especially (as it always happens when dealing with binary data) to convert the nice text files to binary files, and to create C structs matching the binary data. Data loading required reading it from the FLASH and casting it to the C structs. I believe it’s a common method, we didn’t invent the wheel by doing so.
The embedded application in my group predated the PC application of my team. Since the two applications used the same algorithm, it ended up in technology streaming from the embedded application to the PC application. This wasn't good. A PC application can usually exploit a much wider range of technologies, and limiting it to the embedded application span of technologies, will probably delay it.
So we read bulks of binary data in our PC application, casting them to C structs. After a while we became bitter. Occasionally, when the software failed to operate, we had to turn back to the data files to check whether we used a files set containing problematic data, but we couldn’t just inspect the files’ contents, since they were binary. We couldn’t convert them back to text files because no one had the time to write the reversing tool. The only way to inspect the data was via the debugger. That was tiring.
The embedded team experienced even a worse time in the beginning of the project. Since the entire algorithm was in its development phase then, the data changed its form many time, mainly new fields were added. Each time a programmer added a new field to the data and committed the new structs to SVN, troubles hit all other programmers. When they updated from SVN they got the new structs, but their system’s FLASH still contained the old data. It was a mess. Recovering from data updates could take a few days!
Programmers in my team including myself could not tolerate using embedded programming techniques in a PC world. We felt we had to change the situation. We decided to work directly with the original text files. Since the text files weren’t of any standard format, we took a few days to develop a dedicated parser. We wrote the parser so it would fit inside an embedded application as well. The parser required a text buffer to work on and used no dynamic memory allocations. Supplying the text buffer was the concern of the application. In our PC application we read it from a file and allocated it dynamically using new. An embedded application would use a static memory buffer and fill it with textual data read from the FLASH.
We got rid of all the binaries and started using the text files. It felt like a refreshing breeze. We could inspect the files using a simple text editor. Since parsing the files was on a field-by-field basis, missing fields were reported by the parsing functions, where new fields were simply ignored.
We started putting press to pass the technology to the embedded team. Using text files in FLASH memory sounded a too innovative and risky idea, and wasn’t accepted at first. We didn’t give up and continued convincing everybody what a great thing that was, and how it helped us in our PC application. Eventually the instruction has been given and the embedded team set to work. They used our parser, added checksum and size headers to the textual data chunks, measured time and space. The new data used 5 times more FLASH memory and took more time to load, but FLASH was available, and half a second extra to the startup loading time wasn’t of any importance. Everybody was glad, data update problems vanished, and we lived happily ever after.
Well, almost. We had one problem though, that eventually proved to be negligible, but made us a little bit worried in the beginning. When dealing with text, you need to use conversion functions, like atof and atoi to extract the numbers. These functions are implemented differently in different platforms or even different compilers for the same platform.
This resulted in slight differences in floating point numbers between the embedded algorithm and the PC algorithm.
When I try to pinpoint the exact factors making textual data superior to binary data, I come up with these two:
- You can easily inspect the data used by the system. Just download it and read it!
- Text files are parsed on a field by field basis, making the software robust to changes in the data structure. If someone adds a field and supports it with a new software version that uses the field, you can still use the new data file with the previous software version. The previous version simply doesn’t look for that field. Sometimes you want to rollback to the previous software version as a part of a debug process, and you may do it without rolling back the data files (which are usually less strictly managed and hard to rollback).
By the way, since then I always say that new projects should consider large FLASH memories. They will need them to store text files.
Our application needed to read many configuration files from the hard drive. The files resided in a deep and branched directory tree. The directory names as well as the file names were constructed of fixed strings accompanied by dynamic strings, that were made known only after application startup.
For example, the file (not a real one, only to make things clear):
root/Platform_A/MODE B/MODE B SN 002 BasicData.xml
Had only two known-ahead strings, root and BasicData.xml. All the other parts were based on entities that the user entered after the application startup. If the user selected platform A, mode B and serial number 002, then the software tried to read the above file.
Our application had to deal with more than 600 files such as the above one. Here is an exercise: Write a mechanism that constructs paths dynamically, based on some static entities and some dynamic entities.
Well, the solution may be obvious to you, but it wasn’t to us then. We came up with an algorithmic solution, tackling the problem as a data structure exercise. Since the paths weren’t known ahead, we thought they could not be referred to in any way but indirectly. So we built a repository that mapped keys to paths. To construct a path dynamically, we built a tree. Each node in the tree stood for a path component – a subfolder or a file name. Initially each node contained names composed of the only static entities.
In a point in the software where the dynamic entities have been already known, the application went to update the folders tree. It searched the tree top to bottom, and updated each node with the dynamic entities. Constructing a complete path was done by joining all the nodes in a specific path along the tree. This process wasn't automatic. We had to code each node by itself.
Asking for a specific file name required requesting the repository for a path corresponding to a specific key. The path, in its turn, was composed of joined nodes in the folders tree.
This scheme presented two annoying problems.
- It was difficult to know what file corresponded to what key. If I met key1 in the code, I needed to find the basic file name matching key1, then scan the tree code to see what node that basic file name belonged to. After that I needed to search the parent node of that node, until I could figure out the entire path.
- Adding files and folders to the repository was exhausting. You had to define a key, a basic file name, and relate it to a subnode. Even more work had to be done when introducing a new folder was required. The process was error prone, and we felt reluctant to ever deal with it.
- The paths were explicitly written in the code, composed of both static strings and replacement place holders. It was very easy to refer to each file on the file system.
- Zero maintenance for a central repository. There was simple no repository.
Textual solution to the rescue. We came up with a whole different approach. We didn’t need any repository nor a tree. We simply defined a function for substituting place holders in a string. When we needed the file I presented above, we simply requested (not exactly like this, but very similar to):
substfile(“root/<platform>/<mode>/<mode> <sn> BasicData.xml”);.
In the case where the user selected platform A, mode B and serial number 002, the substfile function would replace <platform> with Platform A, <mode> with Mode B and <sn> with SN 002.
This solved the 2 problems:
Again, a textual solution did well, and returned the joy of dealing with that certain mechanism. The solution may also be appreciated by amount of text I needed to describe it in this article. It took me many more lines to describe the first solution than the second solution. I believe this can be true for any solution. If less words to describe it are required, it's probably more simple and elegant.
This one is more cosmetic, but, nevertheless, important. I found it irritating constructing flag values on the argument list of some function. For example,
dosomething(param1, FLAG_USE_THIS | FLAG_USE_THAT | FLAG_ME_TOO);
The code is too verbose, and even less readable when inlining the above call as an argument to another function. I wanted a shorter flagging style. Eventually I came up with the shortest style I could figure out:
Where dosomething would analyze the passed string, relating ”x” to FLAG_USE_THIS, ”y” to FLAG_USE_THAT and ”z” to FLAG_ME_TOO.
For tight real time constraints, string operations, like checking for a character existence, could be a problem. But where this is not the case, I believe this style is good, concise and robust.