Friday, 30 October 2020

PyBloom coding project part 7: Plotting graphs from the data

 

Previously, I set the coding environments, and wrote the code to generate the weather observation data. Here in part 7, I create the graphs of the temperature data.

Generate graphs function

Whilst the point of this program is to look at the current temperature outside just by glancing at my Hue lamps, I also want to be able to look at the history of temperature measurements. I could’ve logged the temperature measurements into a csv file and examine them as a spreadsheet, but I wanted to present them in a more accessible way = data visualisation.


Python has many powerful visualisation libraries, and I’ve spent some time with MatPlotLib (with its Seaborn) wrapper, and find it difficult because it’s comprehensive. Looking at PyGal, it seems much simpler to work with, so I decided to use it for my program.


The technologies

  • PyGal

  • Custom CSS

The code

Observation sets


def generate_graphs(timestamp):

    # observation sets


First things first, let’s set the points that mark the bounds of the observation data. I’m interested in data over the last day, over the last week and over the last month.


now = datetime.strptime(timestamp, DATETIME_STRING)


If you remember from earlier, maths with dates and times is surprisingly difficult, but using the datetime module makes it all easier. However, both Python and SQLite have their own implementation of datetime, so we need to make sure both use the same data dictionary.


  • DATETIME_STRING defines the format (data dictionary)

  • The .strptime method parses a string into a datetime object according to the format


This statement creates a datetime object now from a timestamp string. This might seem puzzling as there’s a perfectly good datetime.now() function, but this is a different now object. It’s not the up-to-the-microsecond now() from the operating system, but the now of the most recent weather observation.


last_day = now - timedelta(days=1)

last_week = now - timedelta(weeks=1)

last_month = now - timedelta(weeks=4)


The timedelta module fulfils the promise of simpler maths, with simple-to-understand syntax.


observation_sets = {

    'last_day': last_day,

    'last_week': last_week,

    'last_month': last_month

}


Having calculated the observation points, I put them in an easy-to-access Python dictionary.


Fetching the data


rows = get_rows('colours')


There are two sets of data that we’re interested in: the colour values (which will map onto the temperatures) and the temperatures themselves. This first get_rows fetches all the columns from the colours table. It’s important to note that the results are returned as a list of tuples. This is going to be hard to manipulate later as we want a list of hex values - but hex values that PyGal understands (i.e. prefixed by #), which isn’t the same as Python type hex (i.e. prefixed by 0x). We therefore need to parse the results.


hex_list = [f'#{hex}' for hex in [row['hex_value'] for row in rows]]


We use a combination of nested list comprehension and f-strings to do this conversion:

  • [row['hex_value'] for row in rows] : this list comprehension cycles through the rows, and creates a list of the hex values

  • for hex in [row['hex_value'] : an outer list comprehension cycles through the newly-created list 

  • [f'#{hex}' for hex : each item of this outer list is inserted into the output string (in effect inserting the hash sign prefix for each element), to create the final list


temps_count = {row['temperature']: 0 for row in rows}


This second command is to set up histogram bins for each of the temperature thresholds, using a bit of dictionary comprehension. Now we have the temperature data in more accessible formats, let’s move onto the observation data.


for string, then in observation_sets.items():


Since we want a similar graph for each of three different time periods, we can save a bit of effort by iterating through the observation points (then). Thanks to the dictionary, we can access both the data and the string representation, which we’ll need in a moment.


sql = 'WHERE timestamp BETWEEN datetime((?)) AND datetime((?))'

when = (then, now)

results = get_rows('observations', rows_sql=sql, args=when)


Again, our little get_rows utility makes accessing the database a little easier. The double brackets look a bit odd. The variable replaced by the underlying SQLite .execute method is represented by (?). However, they are themselves parameters into SQLite’s own datetime method, which is why they need to be enclosed in another set of brackets.


The underlying SQLite query uses its own datetime method to convert Python datetime objects into SQL datetime strings. SQL doesn’t ask for the data dictionary explicitly; it’s assumed that the string corresponds to one of the accepted formats


Once we have the bounds correctly interpreted into the query, we can select only the data that we’re interested in. This is the reason I chose SQLite over csv. If we’d used a csv file to store all the data, I’d have to read all the data into a dataframe into memory, then search it for the data using a library such as Pandas. Not a big overhead, but given SQLite is built into Python, it's an opportunity to import one less library.


Generating the bar graphs


times = [row['timestamp'] for row in rows]

temps = [row['temperature'] for row in rows]


The database gives us a list of dictionaries; each dictionary contains a timestamp and a measurement. We need to convert these multiple lists into two lists, one containing all the timestamps, and another containing all the measurements. This is akin to transposing rows and columns of a table. This is simply done using some list comprehension.


bar_chart = pygal.Bar(x_label_rotation=20,

                      x_labels_major_count=6,

                      show_minor_x_labels=False,

                      show_legend=False)


Now we get into using PyGal to generate the graph. This first statement instantiates the chart object. It’s at this point that we declare it’s a bar graph, and also tell it to rotate the x labels (so that they’ll fit), limit x-axis labels to 6 in total (otherwise they’ll look cluttered), and remove the legend (because there’s only one series being shown).


bar_chart.add('Temperature', [

    {'value': temp,

    'color': '#' + lookup_colour(find_temp_threshold(temp))}

    for temp in temps]

)


Now we add the data series, and its title. To make the colours meaningful, the colour of the bar matches the Bloom colour. This means adding each value to be plotted individually, as a dictionary which describes the value and the colour. The colour is calculated using our previously defined functions, remembering to prefix with a hash symbol. We loop all the temperature measurements using list comprehension.


bar_chart.x_labels = times


The final piece of formatting is to add labels for the x axis, using a nice simple syntax.


filename = string + '_bar.svg'

bar_chart.render_to_file(FILEPATH + filename)


We want the filename to reflect the data set, so this little snippet takes the string representation of the data from the dictionary key, creates a full file path prepending the FILEPATH = './app/static/' global constant, and saves the generated graph there.


Generating the pie charts


We want to generate one pie chart for every bar chart (just in case we need them). The first thing we need to do is to count the number of occurrences of temperature in the bins that were previously generated.


for temp in temps:

    temp_threshold = find_temp_threshold(temp)

    temps_count[temp_threshold] += 1


The algorithm is simplified as we’re done most of the prep work beforehand. For each temperature reading in this observation set, we first find its corresponding temperature threshold, then increment the count in its bin.


custom_style = Style(colors=(tuple(hex_list)))


We can reuse the hex values from the colours table to create a custom colour key for the pie chart. These values are in the form of a list of strings, each prefixed with a hash, which we did earlier. To be properly formatted for PyGal, we now need to convert this list into a tuple, then pass it to the Style function as one of the optional keyword arguments.


pie_chart = pygal.Pie(inner_radius=0.6,

                      style=custom_style)


Instantiating the pie chart is straightforward; the only configuration parameters are the size of its donut hole and the colours of the sections.


for temp, count in temps_count.items():

    pie_chart.add(str(temp), count)


Each pie segment needs to be added individually, so we have a little loop.


filename = string + '_pie.svg'

pie_chart.render_to_file(FILEPATH + filename)


Finally, let’s save the chart to be used by the web app later.


Wrapping up


Now we’ve done it once, we can repeat it for each of the remaining observation sets.


return 'Created graphs'


The database utility has taken care of closing the connection to the database, so all that’s left is to return an acknowledgement string for debugging purposes. The graphs themselves are already stored (or overwritten) in the predefined folder within the loop.

Putting it together

def generate_graphs(timestamp):

    # observation sets

    now = datetime.strptime(timestamp, DATETIME_STRING)

    last_day = now - timedelta(days=1)

    last_week = now - timedelta(weeks=1)

    last_month = now - timedelta(weeks=4)

    observation_sets = {

        'last_day': last_day,

        'last_week': last_week,

        'last_month': last_month

    }


    # get datapoints from database

    rows = get_rows('colours')

    hex_list = [f'#{hex}' for hex in [row['hex_value'] for row in rows]]

    temps_count = {row['temperature']: 0 for row in rows}


    # 3x graphs for every reading in last day, week, month

    for string, then in observation_sets.items():

        # fetch data

        sql = 'WHERE timestamp BETWEEN datetime((?)) AND datetime((?))'

        when = (then, now)

        results = get_rows('observations', rows_sql=sql, args=when)


        # generate bar graph

        times = [row['timestamp'] for row in rows]

        temps = [row['temperature'] for row in rows]


        bar_chart = pygal.Bar(x_label_rotation=20,

                              x_labels_major_count=6,

                              show_minor_x_labels=False,

                              show_legend=False)

        bar_chart.add('Temperature', [

            {'value': temp,

            'color': '#' + lookup_colour(find_temp_threshold(temp))}

            for temp in temps]

        )

        bar_chart.x_labels = times

        filename = string + '_bar.svg'

        bar_chart.render_to_file(FILEPATH + filename)


        # generate pie chart

        for temp in temps:

            temp_threshold = find_temp_threshold(temp)

            temps_count[temp_threshold] += 1

        custom_style = Style(colors=(tuple(hex_list)))

        pie_chart = pygal.Pie(inner_radius=0.6,

                              style=custom_style)

        for temp, count in temps_count.items():

            pie_chart.add(str(temp), count)

        filename = string + '_pie.svg'

        pie_chart.render_to_file(FILEPATH + filename)


    return 'Created graphs'


We've covered the final functional module in this part. They now need to be strung together, which is what I'll cover in part 8 when I talk about orchestrating the code. Also visit https://github.com/Schmoiger/pybloom for the full story.

No comments:

Post a Comment

It's always great to hear what you think. Please leave a comment, and start a conversation!