Originally, I used to call it “Natural Language to SQL Translator”, but I guess what I intend to do at this point of time is not Natural Language, it is rather a specialized syntax, which I guess can be further extended to natural language. Let me make a thing clear, about the system :
- It is something which the user sees, and types a query in, it has nothing to do with the data-related operations in a database.
- The only time it deals with data, is when the result-set is generated and given back to the system, so that it can be post-processed into a better manner.
Hence, its sole purpose is, to help user (end-user/client/power user/reporters) query the database and to give the user a better presentation of the result-set.
I have been developing a Paper Conference Management System for an International Paper Conference which is conducted in our college, and there are various queries and reports which are to be generated, but considering the modularity of the software architecture and the normalized databases, the SQL queries tend to become bigger and bigger and making it difficult for the user who doesn’t know the database schema to generate reports.
For example, a SQL query to get the details of authors who have paper in track "Database Engineering":
SELECT authors.id, users.name FROM user,authors,papers,author_paper,tracks WHERE users.id = authors.id AND authors.id = author_paper.author_id AND papers.id = author_paper.paper_id AND papers.track_id = tracks.id AND tracks.name = "Database Engineering";
but with the system in-place, it would look something like :
get author details having track(name = "Database Engineering");
So, here was the need for a querying system, which would certainly carry out the joins automatically, depending upon the participating entities. This information is currently given with the help of 2 json files which contain the relationships about the entities.
Part 1: Getting the interactions and relationships right
At first, it was important to consider the interactions among various entities in the system and how they are related and affect the database query. Interactions between various relations is important in designing queries. For example, if we want to access first_name of a author, we have to look for it in users table, rather than authors table. So, the interactions are very important in this system.
Part 2: Designing the syntax
This part is only concerned with design of the query and its syntax, once the syntax is finalized, then we can move onto the next part. The syntax is totally dependent upon the user. The syntax can be changed as per the user needs and requirements.
Part 3: Getting to link Part 1 and Part 2
Once, the syntax and interactions are finalized and planned, then the linking of syntax and relationships must be performed. For example, when the attribute “first_name” occurs in the post, then the system, must automatically JOIN users table with authors table.
Part 4: Leave the rest to the Query Helper tool
The Query Helper tool, takes the syntax of the system as well as the user input as the input, and it generates necessary a valid inputs at that point of time. It is similar to the Autocomplete features present in various IDEs.
Rather than just being a way to generate SQL queries, I think it can be coupled with modern web technologies to create a better tool, which helps the user to create queries, without referring to any documentation or looking at the confusing syntax state diagrams. Once, the metadata files have been generated, they can also be given to a AJAX – based search system and assist the user to generate valid queries, without having to remember the details of the syntax.
To develop a robust system, which makes generating SQL queries simpler for the end-user and which doesnot require him/her to remember the exact relationships between the various tables. The system can be condensed to form a tool, which on providing the syntax and entities automatically generates the metadata files and once these files are provided to the algorithm, the query system is good to go. Also, a tool can be made to which can assist the user to generate queries, with the help of which the user can skip the loads of documentation or syntax provided.
It can extend the capabilities of the SQL queries to perform various operations which are not possible in SQL, but are highly required in report generation. With the help of some post-processing.
I envision the file formats and the method to become a standard for such systems and hence, become, generic in nature.
- Uses Finite State Model, so it is all about transitions and hence fast
- If the system changes, only the new metadata files have to be served to the algorithm
- Supports augmented attributes, i.e, “name” can be replaced by “first_name” and “last_name”
- Supports Null Transitions
- The syntax can be customized depending upon the user
- Supports multiple linked tables
If the system, goes to a state multiple times, it would have same conditions added multiple times.
Solution: This has been taken care of by developing a query builder class, which ignores repetition of same attribute or condition in either of the clauses
Ambiguity between same named attributes of various relations.
Solution: Just like in SQL, once a transition is made from a main state, all the attributes and conditions given by the user are replaced by their complete names, i.e., relation_name.attribute_name
Usage of aggregate functions
Usage with tables which are designed as key/value pairs
apart from these, there maybe various other pitfalls, which may have not been thought upon. Yes, but I think all of these can be tackled.
Thank you folks, that was all. Any more suggestions are welcome to help make it a better system (can also someone suggest me a better title or a codename, it is still anonymous)