Importing Libraries

Contents

Importing Libraries#

One of the biggest advantages of using Python is the access to a multitude of pre-written code known as libraries. Utilising these in code can simply be done by importing them:

import datetime

We now have access to anything in the datetime library, clicking the link will take you to the library documentation. The documentation is a great place to start when using a new library as it often includes example usage.

TASK#

Read about the date object in the documentation.

Let’s create a range of dates:

start_date = datetime.date(2023, 10, 1) # 1/10/2023
end_date = datetime.date(2023, 10, 31) # 31/10/2023

num_days = end_date - start_date

# Creates a list containing days of the month for Oct 2023
oct_2023 = [start_date + datetime.timedelta(days) for days in range(num_days.days + 1)]
oct_2023

To try to understand what’s going on above, can you answer these questions?

  • What type of object is num_days?

  • num_days has the attribute days, what type of object is this?

  • Why is the + 1 needed at the end of the range?

Once your comfortable with the code above, let’s use this to try and build some fake data. To do this, we need to import another library, the random library.

from random import gauss

data = [gauss(10, 5) for _ in range(len(oct_2023))] # We use _ for variables we don't need to use, "throwaway" variables
data

Here we’ve only imported the method gauss from the library, so we don’t have access to the rest of the random library, but we can call gauss directly. gauss selects a random number from a gaussian distribution, here we set the mean to 10 and the standard deviation to 20.

Examine your data, calculate any summary statistics you want to including:

  • range

  • mean

  • minimum value

  • maximum value

# Your code here

We can zip the data together:

data_table = [*zip(oct_2023, data)]
data_table

With a bit of formatting, this could then be exported as a csv file, but we’ll do some analysis directly on this data. We can group the data by the week number by utilising another library:

from itertools import groupby

def group_key(row):
    date = row[0]
    _, week_num, _ = date.isocalendar()
    return week_num

weekly_data = {}
for week_num, group in groupby(data_table, key = group_key):
    # the grouped data is still in the format (date, data) we only need the data now:
    grouped_data = [row[1] for row in group]
    summed_data = sum(grouped_data)

    weekly_data[week_num] = summed_data
weekly_data

In reality, you’ll often see the code above written in the format below, it does the same thing as above, but is a lot more compact. When faced with something like this, try and break it down into its component pieces.

{week_num: sum([*zip(*group)][1]) for week_num, group in groupby(data_table, key = lambda row: row[0].isocalendar()[1])}

{key: val for key, val in iterable} is a dictionary comprehension, which should be a familiar format now.

iterable here is groupby(data_table, key = lambda row: row[0].isocalendar()[1]), where the key is the method we’re using the group the data by, in the explicit box we defined a function, here we’re using a lambda function, which are designed to be simple inline functions and take the format lambda input: output.

val here introduces unpacking zip(*group) unpacks the group and zips together the result, so instead of seeing

zip([(date, datum), (date, datum) etc...])

it removes the list, so each element is passed to zip, e.g.,

zip((date, datum), (date, datum), etc...).

The zip object is then unpacked into a list ([*zip([...])]), which would give [(date, date, date), (datum, datum, datum)], we’re then selecting the 1st element from that list (the data) and summing it.

As is evident from this, it’s easy to push a lot of processing into a single line in Python. It really highlights the importance of having sensible comments, variable and function names!

TASK#

You have been provided with a hypothetical model that takes into account the type of medical procedure and a duration and returns a forecasted wait list.

You need to provide the model with all the combinations of the medical tests and the durations, create a list of tuples with the combinations, e.g., it should produce [("Blood test", 10), ("Blood test", 20), ("Blood test", 30), ("X-ray", 10) etc...]

medical_tests = ["Blood test", "X-ray", "MRI"]
durations = [10, 20, 30]
# Your code here

Once you’ve done that in native Python, take a look at the documentation for itertools.product, can you repeat the task above in a single line of code?

# Your code here