Feats of MySQL and Impresario

I am currently working with three other teammates to develop a collaborative template-driven document creation tool, awesomely named ‘Impresario’. I will post more on what it is and what it looks like later, when the next iteration of the user interface is completed. In the meantime, I wanted to share some of the back-end work that we have done. We are using a MySQL database to store every user, document, comment, etc. If you don’t know what MySQL is, or have never heard of a relational database, feel free to just take in the beauty of a picture of it:

Impresario Database Map

Impresario Database Map

This is a reverse-engineered map of our current database in Crow’s Foot notation, courtesy¬†of MySQL Workbench. We didn’t develop the database with Workbench, but for more complicated databases, it would be a good idea to use a tool like this. For the most part we are using the InnoDB engine, because it allows us to enforce referential integrity. However, we are using MyISAM for the content table, as it allows us to perform fulltext searches on the text.

Speaking of fulltext searches, let me show you the query I designed to accomplish this:

SELECT   d.did     AS did    ,
         d.name    AS name   ,
         d.status  AS status ,
         s.sid     AS sid    ,
         s.title   AS title  ,
         x.content AS content,
         x.ranking AS ranking
FROM     document  AS d
         JOIN ( section AS s
                  JOIN
                           ( SELECT c.sid                                 AS sid    ,
                                   c.content                              AS content,
                                   MATCH (c.content) AGAINST ('internet') AS ranking
                           FROM    content                                AS c
                                   JOIN ( role AS r
                                           JOIN ( auth       AS a
                                                   JOIN USER AS u
                                                   ON      u.uid=a.uid )
                                           ON      r.rid        =a.rid )
                                   ON      c.sid                =a.tid
                           WHERE   c.sid IN
                                   ( SELECT s.sid
                                   FROM    section AS s
                                           JOIN ( document   AS d
                                                   JOIN auth AS a
                                                   ON      d.did=a.tid )
                                           ON      s.did        =d.did
                                   WHERE   a.type               ='did'

                                   UNION

                                   SELECT s.sid
                                   FROM   section   AS s
                                          JOIN auth AS a
                                          ON     s.sid=a.tid
                                   WHERE  a.type      ='sid'
                                   )
                           AND     a.uid         =6
                           AND     r.permission >= 1
                           AND     MATCH (c.content) AGAINST ('internet')
                           )
                           AS x
                  ON       s.sid=x.sid )
         ON       s.did         =d.did
ORDER BY x.ranking DESC;

I’m sorry if the query doesn’t look the prettiest. The only SQL beautifier I found to work on this contorted query was the one from sqlinform.com. This query’s purpose in life is to join half the tables in the database in order to find all the sections a user has the privilege to read, and then perform a fulltext search on the text content of that section, and return a nice list of data to me in ranked order. Currently there is only a few paragraphs of data in the content table, yet the query takes ~.3 seconds to execute (whereas a much simpler query could only take ~.02 seconds to complete). Such are the woes of a mostly-normalized database, queries can be slow because of all the table joins. Soon we will finish importing a large amount of test data to see if the query functions adequately under a heavy load. If the ~.3 second delay turns out to be a result of the table joins and becomes a much larger (longer?) problem, we may have to pay special attention to how we index these affected tables.

Believe it or not, I actually enjoy doing this stuff. It should also be fun to see the difference between the search results we obtain using the default boolean search mode, and MyISAM’s natural language or query expansion modes.

This entry was posted in Impresario, SQL. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>