This past month, I submitted a first post in a series that discusses some of my pro bono contributions as Statistician for the IEEE Chicago Section. As stated in that post, my goals are to work with the Membership Development Chair to meet his immediate needs, as well as explore some ways that we can use the data to help understand our members. In other words, look at our membership at both an aggregate and individual level. I also mentioned that in my opinion, some of the best insights will likely come from an investigation of the data at a more granular level, and then determine how findings might apply to the aggregate.
Since it was recently requested that I take ownership of some ongoing monthly reports that have been manually constructed by exporting data from canned IEEE queries in an Oracle Business Intelligence Enterprise Edition (OBIEE) 11g instance called SAMIEEE to Microsoft Excel, I am expecting the need to review at some level what has been done in the past with respect to these reports. In taking a first look at these reports, I especially wanted to gain familiarity with the graph and associated spreadsheet for year over year (YOY) monthly membership totals, because it provides a snapshot of the chapter over time, and has been used to demonstrate the annual membership lifecycle.
The first graph presented above was exported directly from this latest such Microsoft Excel workbook. My first glance at this graph immediately noticed the following: (1) every year exhibits a sharp drop from January to February, (2) data is plotted for a minimal number of years, (3) the first year is only partially plotted, and (4) not every data point is labeled. After reviewing this graph a little more closely, I also noticed two additional aspects of the graph: (5) the x-axis is labeled for year-2012, and (6) although 12 months are listed, only 11 are actually plotted.
While fixing (4), (5), and (6) for the Microsoft Excel version of this graph presented below, I also came across several other essentially trivial issues that I will not mention here. I am also not going to get into all of the ways to tweak graphs in Microsoft Excel, because the purpose of these posts are not to increase usage of the product. Regarding (2) and (3), the data used to plot the original inherited version of this graph came from an OBIEEE dashboard called "Geographic Membership Statistics", the data of which starts in August 2007.
An IEEE director recently explained to me, in agreement with my assumption, that the data used to populate this dashboard is different than what is available via the adhoc query area of SAMIEEE, which to my knowledge contains all IEEEE membership data for the last several decades. While a separate discussion, I have asked for additional data availability to help bridge the gap between the data available via this dashboard, and the data available via adhoc queries, in order to perform the value added analyses and visualizations for which I was initially interested as the IEEE Chicago Section Statistician.
Regarding (1), I came to realize quickly by looking at the data that a significant portion of members go into arrears during the annual membership enrollment period, dropping the number of active members in January every year. In other words, members enter a non-active status when they do not renew membership by the deadline. The difference between this status and non-active statuses is that associated members enter a grace period during which time active status can be regained without penalty. As soon as I get access to the additional requested data, I will be able to to determine which membership gains following February each year are due to new memberships, and which are due to renewed memberships, and then investigate member characteristics between these two groups.
In addition, I will be able to more fully automate the R scripts just created to graph other items in the canned reporting dashboard called "Geographic Membership Statistics". Two of the first graphs converted to the R language are presented below, both in their raw and presentation formats. The most important aspect of these R scripts is that they provide partial automation. The raw CSV data sets exported from OBIEEE are read into R as data frames, and a few (some quite lengthy) script lines later the graphs are complete with a standard look-and-feel that are void of the need to tweak like Microsoft Excel because all are created using scripts that can be re-run with updated data sets over time.
Pro Bono: Taproot Foundation - Project Manager (Draft Direction)
Pro Bono: Taproot Foundation - Project Manager (Delivery & Assessment and Closing)
Pro Bono: IEEE Chicago Section Statistician - Part 1, Part 3, Part 4, Part 5