BitPim currently stores information as a Python dictionary. This information is saved in multiple files (one per information type) as sourceable Python code. The data can be easily inspected with a plain text editor.
Users never need to explicitly load and save data (ie there is no need for them to manage the transitioning of data between temporary storage - RAM - and persistent storage - disk.)
BitPim currently has no undo functionality. Any edits take effect immediately and there is no ability to reverse mistakes.
It is currently not possible to do a sync. Syncing requires being able to examine two snapshots of data and generate a list of changes that were made (eg the name "John Smith" was changed to "John Smythe")
BitPim doesn't work correctly when run concurrently as the same user. The user is not prevented from starting a second instance, and multiple instances just continue oblivious to each other. The old solution of preventing multiple instances at startup is no longer appropriate since users can and do access their machines via different means (eg logging in on the console and logging in remotely). Some programs such as Mozilla/Firefox force you to have multiple independent profiles, which is very annoying.
BitPim currently doesn't support multiple information stores. This happens if there are multiple users who login as the same user at the operating system level. There is some advice in the online doc, which amounts to switching the preferences behind BitPim's back before starting to switch the main data directory.
Care also needs to be taken over dealing with version issues. This means BitPim starting up with an older version of the saved data, or the saved data being a newer format than the current version.
BitPim currently holds all data in memory. This makes memory consumption equal the amount of data, and can get very large.
BitPim will be migrating to use the SQLite database. SQLite is accessed using SQL syntax and is an embedded database - you have it as part of your program and do not contact it over the network. The Python wrapper is pysqlite and everything is available under appropriate licenses and is available on all platforms.
It has many many other nice properties such as using a single file, being safe for usage in multi-threaded and multi-process environments, is ACID compliant (Atomic, Consistent, Isolated, and Durable), survives power and unexpected program termination, etc. There is no access control or other security issues to deal with. The only requirement is access to the single file (via normal filesystem and process permissions).
Version 3 of SQLite uses unicode natively for strings, supports BLOBS (binary large objects), and allows unlimited field size.
SQLite is also different than other databases in that the type of a field is attached to each value in each record, rather than to the column as a whole. This is very similar to how Python works where the type of a value is attached to the value itself, not to the name it is given. (Contrast with C/Java where the type is associated with the variable name, not the value).
Some more reading on SQLite:
One table will contain meta information. Primarily this will be the version of BitPim to which the the database corresponds.
On startup, BitPim will inspect the version information. If it is older than the current version of BitPim then a copy of the file will be made.
For example, if the current version of BitPim is 1.2
and the database says it is for 1.1 then a copy will be made as
foo-1.1-`date`
The main data type used in BitPim will be the dict, as is
currently the case. dicts will be saved to tables with each dict
key being a column in the table. None
values in
Python are mapped to null
in SQL. When reading from
the table, a dict is produced based on the columns. Note that
columns with a null value will not have any key in the returned
dict.
When saving to a table, the support code will automatically
create columns as needed, that default to null
.
Each table will be a journal. Existing rows will never be
modified, only new rows added at the end. A distinct entry is
identified by a unique identifier and is stored in a column named
__uid__
. Consequently a table will typically look
like this:
primary key
(integer)Name Phone number __uid__ 0 John Smith 123456 0x4523 1 Fred Bloggs 7676987897 0x8769 2 John Smythe 123456 0x4523 3 Spiderman 435435345 0x7888 4 John Smythe 123888 0x4523
You can see how the "John Smith" record was editted at row 2,
and again at row 4. To produce the list of "current" records, the
last entry the table for a particular uid
is used.
There will be an additional __timestamp__
column.
That will allow for retrieving old values for any record, as well
as archiving off very old values (eg if the user doesn't care
about anything old than 6 months).
There will also be a __deleted__
column which is
set to true when a particular uid is deleted.
The uid will actually be a long unique string. For phonebook records, it will be the bitpim serial.
This scheme will allow easy undo's since you can always find out what any particular record used to look like. You can also track a mass action (eg 10 records being selected and then all deleted at the same time) since they will have the same timestamp.
Undo will also be possible between runs of the program, and even amongst multiple running concurrent instances!
The initial implementation will create a new
database.py
file. The existing phonebook code will
point to this new module. Code in database.py
will
continue to use the existing routines that read and write
index.idx
as well as to the sqlite database. The data
will be compared between the two to ensure it is working
correctly. Once we are certain the code talking to sqlite is
correct, then the code using index.idx
will be
switched off.
The process will then be repeated for the other data types (calendar, wallpaper and ringtones).
New data sources (SMS, call history, voice and text memos) will just use the sqlite database exclusively.
Almost every field in a phonebook entry is a list of dicts. This can be stored as a single value (the string representation), or a new table can be created, and redirect to that. Both approaches are shown below for the phone numbers column with most other columns omitted for the purpose of clarification.
primary key
(integer)Name Phone numbers 0 John Smith [{'number': '1234567890', 'type': 'home'}, {'number': '233423423', 'type': 'work'}]
primary key
(integer)Name Phone numbers 0 John Smith __numbertable:0,2 _numbertable is:
primary key
(integer)number type 0 1234567890 home 1 76547657 cell 2 233423423 work
I am leaning towards implementig both schemes in
database.py
since they are not mutually exclusive and
then see how it goes. My instinct is inclined towards the
indirect table since it will save space, and allows faster
searches down the road.
sqlite does allow storing blobs (binary large objects) in the
database itself, so we could store the actual files directly in
the database. The other alternative is store the files on disk
with non-descript names (eg 0000001.jpg
) and then
point to the file from the relevant records.
My instinct is for the latter approach for wallpaper and ringtones since it will keep the database smaller. For other file like items such as text memos and SMS messages, I would be inclined to keep them directly in the database.