Wednesday, December 31, 2014

phpMyAdmin work during the twelfth week

A merry Christmas to everyone!!!

Christmas happened to fall on my Twelfth week of work. The work week was a short one since I took leave on Thursday and Friday (25th and 26th) around Christmas. Just like the last week, I focused on bug fixing during this week as bugs were reported frequently. Following are the bugs that were fixed during the week

Bugs fixed
#4651 don't open console with esc
#4653 Always connection error was shown, on /setup at tab "configuration storage"
#4661 Drag and drop file import always fails
#4663 Export missing back ticks for order table name
#4664 select min() displays 1 row, but reports the table amount of rows returned
#4666 Undefined indexes in table print view of a view
#4668 Remove from central columns error
#4670 CSV import reads both commas and values into first column after first row
#4671 Unable to move all columns

Additionally following feature request was addressed.

Feature requests implemented
#1563 Change keyboard shortcut for console display to other then 'escape'

The last time I updated the TCPDF library we got an error and I reported it to upstream. Since the developer has fixed it, TCPDF library used in phpMyAdmin was updated to its latest version.

Other
Updated TCPDF to its latest version 6.2.3

Sunday, December 21, 2014

phpMyAdmin work during tenth and eleventh weeks

Here I am combining work done on two weeks since I was on leave for four days during the latter week and there is no point of writing a separate blog post for just one day of work.

During the tenth week I concentrated solely on the bug fixes as a load of bugs were reported during the week. Most of them were on the 4.3 series and this indicated that more and more people are starting to use the new version, which is great.

So here is the list of bugs I fixed during the six days

Bugs fixed
#3794 failure to handle repeating empty columns when importing ODS
#4617 UI issues with sortable tables
#4625 "Insufficient space to save the file" on export SQL to file on server
#4626 Ctrl + click on a column not is sort triggers a server call to erroneous url
#4627 "file_get_contents(examples/create_tables.sql): failed to open stream" after update
#4628 PHP error while exporting schema as PDF
#4629 Problem with custom SQL queries using cookie authentication
#4630 AJAX request infinite loop
#4631 Server selector submits two server parameters
#4632 Notice in ./libraries/Util.class.php#1916 Undefined index: query
#4633 Wrong parameter in fetchValue
#4634 Error reporting creates an infinite loop
#4638 Default Export Method setting broken
#4639 Export SQL missing indentation first field
#4644 Error when browsing tables
#4646 Version Check Broken
#4649 Attributes field size smaller than others

In addition to that I attended to the following list of bugs

Bugs attended
#4438 “Token Mismatch” Error In Safari via MAMP
#4589 Can not add new procedures
#4641 Zeroconf PMA tables support
#4642 phpmyadmin often fails to load due to specific load order

Public Transportation Usage Analysis for City of Chicago

As part of my BI course I'm taking for my part time MSc, I started working on an analysis on the public transit data of the city of Chicago.

Motivation

Public transportation system can benefit greatly from insights into the traffic volumes. These insights can be helpful in identifying the peak times and peak routes and will be instrumental in effective allocation of buses and other transportation modes to cater those transportation needs. This can further help the authorities in making other corporate decisions such as purchase decisions and decisions to issue route permits (in SL context) etc.

Further, such insights would be used to do demand management. In demand management the user is offered incentives to direct users to use the transportation system when the congestion is less. Airline industry benefits greatly from demand management and this can be applied to other transportation to various extends. To do demand management, insights into the traffic details would be crucial and it can in turn help the authorities of companies identify the effectiveness of such demand management strategies.

Data Set

The data was available from the Open Data Portal of City of Chicago and it included data for both bus and metro lines. The data was in the granularity of days and for buses it had data for each of the bus routes. In the case of metro, the data was in the form of metro station entries. More data was available on bus stops and metro stop. However, these data could not be used for the analysis since the fact tables were in the granularity of bus route and station entries.

The data was in CSV format and my first task was to get them into a relational databases. I did a quick analysis on data in the relational database. Bus rides data had about 700,000 tuples while the station entry data had about 600,000 tuples. I had to derive separate tables for metro stations and bus routes and define required primary keys as well as foreign keys among tables. Since the data was in granularity of days I created a new table for days and added extra columns such as year, month, day of month, day of week which is helpful in further analysis. So my final schema looked like this.

Analysis

For the analysis the preferred BI suite chosen was Pentaho. However, I soon realized the the Community Edition of it lacks much of the ease of use and many things has to be done manually.

BI schema

Since time is the major dimension of the analysis I required time to form a hierarchy and designing time in a hierarchical manner was not possible with the schema generation in Pentaho BI server. For that I used the Pentaho Schema Workbench which allowed me to properly mark the dimension and time for the required hierarchy.

OLAP cube

OLAP cubes were generated using the jPivot plugin for Pentaho. Since time was defined in a hierarchy, now I can drilldown the cube in a more meaningful manner (in a year, month, day hierarchy). 

Dashboard

The dashboard available in the community edition lacks ease of use and dashboard needs to be defined and the tweaks needs to be done with CSS and JavaScript. So this dashboard creation is not for non-geeks. I included the following charts in the dashboard.
  • Bus rides by bus routes
  • Station entries by station
  • Bus rides and station entries by year (to identify overall trend in public transportation usage) 
  • Bus rides and station entries by month (to identify seasonal variations)
  • Bus rides and station entries by day of month
  • Bus rides and station entries by day of week (to identify weekly patterns)
  • Bus rides and station entries over time

Reports

Pentaho comes with a report design tool which allows creating report templates and binding data with them easily . I created reports which included total bus rides and station entries by month, year and by station and routes. Report Designer allows publishing those reports to the BI server. However, I got an error while doing so. Though I did not find any documentation to suggest that, I suspect this is not allowed in the community edition. Report designer allows publishing reports in multiple formats and the following is a sample report in HTML format.

Observations and Interesting Findings

A number of interesting observations were made using the Pentaho dashboard. 

There is an year by year increase of metro usage while the usage of buses remained constant over the course of 13 years. 


As expected, Sunday recorded the least usage of public transport followed by Saturday. However, interestingly Monday had a noticeable drop compared to other week days. May be people feel lazy to wake up on Monday mornings and rush to work with their own vehicles :)


Moreover, interesting seasonal variations were observed which is suspected to be related to four seasons. There was a noticeable drop in the public transit usage in the months of December, January, and February and this is possibly due to the winter conditions.


Further I noticed that some days in the month has considerable less usage. In the below graph 31st day of the month record lowest since not every month has a 31st. The drop in 29th and 30 the can probably be asserted to the month of February. Drops on 1st, 4th and 25th can possible be due to US holidays that falls on fixed days (1st Jan - New year's day, 4th Jul - Independence day, 25th Dec - Christmas day). However, I did not find such a drop for 11th Nov which is the Veteran's day.

Tuesday, December 9, 2014

phpMyAdmin work during the ninth week

This was a four day week since I was on leave on 3rd Wednesday.

During the week I was more engaged in code improvement work opposed to putting effort on bug fixing compared to previous weeks. The jQuery and jQuery-ui versions phpMyAdmin was using were too outdated and it needed an upgrade for sometime. jQuery was stuck at version 1.8 since there were major changes in upgrading from 1.8 to 1.9. This included a set of methods removed such as .live() and .die() methods which were deprecated for sometime. However, these methods were heavily used in phpMyAdmin making phpMyAdmin stuck at version 1.8.

I started with going through the upgrade guide for jQuery 1.9 [1] and inspecting the phpMyAdmin for possible breakages in case of an jQuery upgrade. There were heaps of .live() and .die() methods to be replaced and more instances of other methods removed. 

Then I went on to check each of jQuery plugins for their updates as most of them were pre 1.9. Some of them had updates - the ones that are actively being developed -, while for the others, I had to patch them to be compatible with jQuery 1.9+. Now after much effort phpMyAdmin uses jQuery version 1.11.1 and jQuery-ui version 1.11.2.

Code improvements
Making code compatible with jQuery 1.9+ and upgrading phpMyAdmin to use latest versions of jQuery and jQuery-ui
Upgrading phpMyAdmin to use latest versions of jQuery plugins where newer versions are available and upgrading other jQuery plugins to be compatible with jQuery 1.9+

Bug Fixes
#4615 SQL highlighting in process list breaks on auto refresh
#4616 Warning on db structure print view page
#4618 Page scrolls while GIS visualization is zoomed in/out with mousewheel

Security fixese
#4611 DOS attack with long passwords (coordination)
#4612 XSS vulnerability in redirection mechanism (coordination)

[1] http://jquery.com/upgrade-guide/1.9/

Monday, December 1, 2014

phpMyAdmin work during the eighth week

As mentioned in my last blog post I started working on the token mismatch issue that we received a lot of complaints about for so long. The main issue was that both http and config authentications were not able to refresh the page when when PHP session expires. I implemented a solution similar to what we used for cookie authentication on session timeout which would refresh the page with new token. 

I also attended to regular bug fixes reported within the week. Altogether here is the list of bugs fixed and attended.

Bugs fixed
#4605 Unseen messages in tracking
#4606 Tracking report export as SQL dump does not work
#4607 Syntax Error during db_copy operation
#4608 SELECT permission issues with relations and restricted access
#4227 Token mismatch when using HTTP AUTH and the SESSION expires
#4280 Error: Token mismatch
#4387 Token mismatch (one server http auth, one cookie)
#3893 Token mismatch error on every action

Bugs attended
#4569 Issues with Relation View dropdowns
#4409 Not displaying results of archive table

During the latter part of the week I concentrated on refactoring and code improvements. I improved the database and table level tracking pages with facility to enable/disable tracking for multiple tables, ability to delete multiple versions of tracking etc. I also worked on displaying results of multiple queries. This involved quite a lot of changes since most of the work had been done assuming that only one result set would be displayed. This work continued to the next week. So here is the list of feature requests and refactoring done during the week.

Feature requests
#1562 Allow tracking multiple table at once from database level tracking page
#1564 Improve action message on Tracking page

Refactoring and code improvements
Refactoring database level and table level tracking pages
Refactoring query results display page to facilitate display results of multiple queries at once

PS: I took the day off on Monday, the 24th.