resonATe Overview¶
resonATe is the Ocean Tracking Network’s acoustic telemetry analysis toolkit. It can be used to filter, compress, visualize and analyze acoustic detection extracts from OTN and other marine telemetry data.
- Abacus Plot
- Bubble Plot
- Cohort
- Compressing Detections
- Distance Matrix
- Filtering
- Interval Data
- Residence Index
- Receiver Efficiency Index
- Unique ID
- Visual Timeline
Abacus Plot¶
The abacus plot is a way to plot annimal along time. The function uses Plotly to place your points on a scatter plot. ycolumn
is used as the y axis and datecollected
is used as the x axis. color_column
is used to group detections together and assign them a color. Details are in Abacus Plot.
Bubble Plot¶
The bubble plot function returns a Plotly scatter plot layered ontop of a map. The color of the markers will indicate the number of detections at each location. Alternatively, you can indicate the number of individuals seen at each location by using type = 'individual'
. Details are in Bubble Plot.
Cohort¶
The tool takes a file of compressed detections and a time parameter in minutes. It identifies groups of animals traveling together. Each station an animal visits is checked for other animals detected there within the specified time period. Details are in Cohort Tool.
Compressing Detections¶
Compressing detections is done by looking at the detection times and locations of an animal. Any detections that occur successively in time, in the same location, and the time between detections does not exceed the timefilter
, are combined into a single detection with a start and end time. The result is a compressed detections Pandas DataFrame.
Compression is the first step of the Mihoff Interval Data Tool. Compressed detection DataFrames are needed for the tools, such as interval and cohort. Details are in Compression Tool.
Filtering¶
(White, E., Mihoff, M., Jones, B., Bajona, L., Halfyard, E. 2014. White-Mihoff False Filtering Tool)
OTN has developed a tool which will assist with filtering false detections. The first level of filtering involves identifying isolated detections. The original concept came from work done by Easton White. He was kind enough to share his research database with OTN. We did some preliminary research and developed a proposal for a filtering tool based on what Easton had done. This proof of concept was presented to Steve Kessel and Eddie Halfyard in December 2013 and a decision was made to develop a tool for general use.
This is a very simple tool. It will take an input file of detections and based on an input parameter will identify suspect detections. The suspect detections will be put into a dataframe which the user can examine. There will be enough information for each suspect detection for the user to understand why it was flagged. There is also enough information to be able to reference the detection in the original file if the user wants to see what was happening at the same time.
The input parameter is a time in minutes. We used 3600 seconds as the default as this is what was used in Easton’s code. This value can be changed by the user. The output contains a record for each detection for which there has been more than xx minutes since the previous detection (of that tag/animal) and more than the same amount of time until the next detection. It ignores which receiver the detection occurred at. That is all it does, nothing more and nothing less. Details are in Filter Tool.
Two other filtering tools are available as well, one based on distance alone and one based on velocity. They can be found at Filter Tools as well.
Distance Matrix¶
This takes a DataFrame created by the White-Mihoff False Filtering tool. The file contains rows of station pairs with the straight line distance between them calculated in metres. A station pair will only be in the file if an animal traveled between the stations. If an animal goes from stn1 to stn2 and then to stn3, pairs stn1-stn2 and stn2-stn3 will be in the file. If no animal goes between stn1 and stn3, that pair will not be in the file. The tool also takes a file that the researcher provides of ‘real distances’. The output will be a file which looks like the first file with the ‘real distance’ column updated. Details are in Distance Matrix Tool
Interval Data¶
(Mihoff, M., Jones, B., Bajona, L., Halfyard, E. 2014. Mihoff Interval Data Tool)
This tool will take a DataFrame of compressed detections and a distance matrix and output an interval DataFrame. The Interval DataFrame will contain records of the animal id, the arrival time at stn1, the departure time at stn1, the detection count at stn1, the arrival time at stn2, time between detections at the two stations, the interval in seconds, the distance between stations, and the velocity of the animal in m/s. Details are in Interval Data Tool.
Residence Index¶
This residence index tool will take a compressed or uncompressed detection file and caculate the residency index for each station/receiver in the detections. A CSV file will be written to the data directory for future use. A Pandas DataFrame is returned from the function, which can be used to plot the information. The information passed to the function is what is used to calculate the residence index, make sure you are only passing the data you want taken into consideration for the residence index (i.e. species, stations, tags, etc.). Details in Residence Index Tool.
Receiver Efficiency Index¶
The receiver efficiency index is number between 0
and 1
indicating the amount of relative activity at each receiver compared to the entire set of receivers, regardless of positioning. The function takes a set detections and a deployment history of the receivers to create a context for the detections. Both the amount of unique tags and number of species are taken into consideration in the calculation. For the exact method, see the details in Receiver Efficiency Index.
Unique Id¶
This tool will add a column to any file. The unique id will be sequential integers. No validation is done on the input file. Details in Unique Detections ID.
Visual Timeline¶
This tool takes a detections extract file and generates a Plotly animated timeline, either in place in an iPython notebook or exported out to an HTML file. Details in Visual Timeline.
Contents:¶
Installation¶
resonATe can be installed using Pip or through a Conda environment.
Conda¶
conda config --add channels conda-forge
conda install resonate
Pip¶
pip install resonate
Preparing Data¶
resonATe requires your acoustic telemetry data to have specific column headers. The column headers are the same ones used by the Ocean Tracking Network for their detection extracts.
The columns you need are as follows:
- catalognumber - A unique identifier assigned to an animal.
- station - A unique identifier for the station or mooring where the receiver was located. This column is used in resonATe for grouping detections which should be considered to have occurred in the same place.
- datecollected - Date and time of release or detection, all of which have the same timezone (example format:
2018-02-02 04:09:45
). - longitude - The receiver location at time of detection in decimal degrees.
- latitude - The receiver location at time of detection in decimal degrees.
- scientificname - The taxonmoic name for the animal detected.
- fieldnumber - The unique number for the tag/device attached to the animal.
- unqdetecid - A unique value assigned to each record in the data. resonATe includes a function to generate this column if needed. Details in Unique Detections ID.
The Receiver Efficiency Index also needs a deployment history for stations. The columns for deployments are as follows:
- station_name - A unique identifier for the station or mooring where the receiver was located. This column is used in resonATe for grouping detections which should be considered to have occurred in the same place.
- deploy_date - A date of when the receiver was placed in a water or is active (example format:
2018-02-02
). - recovery_date - A date of when the receiver was removed from the water or became inactive (example format:
2018-02-02
). - last_download - A date of the last time data was retrieved from the receiver (example format:
2018-02-02
).
All other columns are not required and will not affect the functions; however, they may be used in some functions. For example, receiver_group
can be used color code data in the Abacus Plot.
Warning
Detection records from mobile receivers, i.e. from receivers attached to gliders or animals, as well as satellite transmitter detections, will not necessarily be appropriate or compatible for use with all of these tools.
Renaming Columns¶
Pandas provides a rename()
function that can be implemented as follows:
import pandas as pd
df = pd.read_csv('/path/to/detections.csv')
df.rename(index=str, columns={
'your_animal_id_column':'catalognumber',
'your_station_column':'station',
'your_date_time_column':'datecollected',
'your_longitude_column':'longitude',
'your_latitude_column':'latitude',
'your_unique_id_column':'unqdetecid'
}, inplace=True)
Example Dataset¶
catalognumber | scientificname | commonname | receiver_group | station | datecollected | timezone | longitude | latitude | unqdetecid |
---|---|---|---|---|---|---|---|---|---|
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:10 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170932 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:12 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170933 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX249 | 2014-06-08 20:12 | UTC | -63.50002 | 42.88764 | HFX-A69-9001-26655-170934 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:14 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170935 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:16 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170936 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:17 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170937 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX248 | 2014-06-08 20:27 | UTC | -63.50002 | 42.89487 | HFX-A69-9001-26655-170938 |
NSBS-Sophie | Prionace glauca | blue shark | HFX | HFX247 | 2014-06-08 20:28 | UTC | -63.49995 | 42.90203 | HFX-A69-9001-26655-170939 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:36 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173924 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX118 | 2014-06-20 11:37 | UTC | -63.32552 | 43.8043 | HFX-A69-9001-26667-171528 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:38 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173925 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX118 | 2014-06-20 11:38 | UTC | -63.32552 | 43.8043 | HFX-A69-9001-26667-171529 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:40 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173926 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX118 | 2014-06-20 11:41 | UTC | -63.32552 | 43.8043 | HFX-A69-9001-26667-171530 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:42 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173927 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX118 | 2014-06-20 11:43 | UTC | -63.32552 | 43.8043 | HFX-A69-9001-26667-171531 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:44 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173928 |
NSBS-Lola | Prionace glauca | blue shark | HFX | HFX119 | 2014-06-20 11:46 | UTC | -63.3331 | 43.79986 | HFX-A69-9001-26667-173929 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX182 | 2014-06-21 3:21 | UTC | -63.50012 | 43.36992 | HFX-A69-9001-26669-173703 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX183 | 2014-06-21 3:22 | UTC | -63.50003 | 43.3631 | HFX-A69-9001-26669-174594 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX183 | 2014-06-21 3:24 | UTC | -63.50003 | 43.3631 | HFX-A69-9001-26669-174595 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX183 | 2014-06-21 3:26 | UTC | -63.50003 | 43.3631 | HFX-A69-9001-26669-174596 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX183 | 2014-06-21 3:28 | UTC | -63.50003 | 43.3631 | HFX-A69-9001-26669-174597 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX182 | 2014-06-21 3:29 | UTC | -63.50012 | 43.36992 | HFX-A69-9001-26669-173704 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX183 | 2014-06-21 3:30 | UTC | -63.50003 | 43.3631 | HFX-A69-9001-26669-174598 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX182 | 2014-06-21 3:30 | UTC | -63.50012 | 43.36992 | HFX-A69-9001-26669-173705 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX182 | 2014-06-21 3:38 | UTC | -63.50012 | 43.36992 | HFX-A69-9001-26669-173706 |
NSBS-Ophelia | Prionace glauca | blue shark | HFX | HFX182 | 2014-06-21 3:43 | UTC | -63.50012 | 43.36992 | HFX-A69-9001-26669-173707 |
Abacus Plot¶
The abacus plot is a way to plot annimal along time. The function uses
Plotly to place your points on a scatter plot. ycolumn
is used as
the y axis and datecollected
is used as the x axis. color_column
is used to group detections together and assign them a color.
Warning
Input files must include datecollected
as a column.
from resonate.abacus_plot import abacus_plot
import pandas as pd
df = pd.read_csv('/path/to/detections.csv')
To display the plot in iPython use:
abacus_plot(df, ycolumn='catalognumber', color_column='receiver_group')
Or use the standard plotting function to save as HTML:
abacus_plot(df, ipython_display=False, filename='example.html')
Below is the sample output for blue sharks off of the coast of Nova Scotia.
Abacus Plot Function¶
-
abacus_plot.
abacus_plot
(detections, ycolumn='catalognumber', color_column=None, ipython_display=True, title='Abacus Plot', filename=None)¶ Creates a plotly abacus plot from a pandas dataframe
Parameters: - detections – detection dataframe
- ycolumn – the series/column for the y axis of the plot
- color_column – the series/column to group by and assign a color
- ipython_display – a boolean to show in a notebook
- title – the title of the plot
- filename – Plotly filename to write to
Returns: A plotly scatter plot
Bubble Plot¶
The bubble plot function returns a Plotly scatter plot layered ontop of
a map. The color of the markers will indicate the number of detections
at each location. Alternatively, you can indicate the number of
individuals seen at each location by using type = 'individual'
.
Warning
Input files must include station
, catalognumber
, unqdetecid
, latitude
, longitude
, and datecollected
as columns.
from resonate.bubble_plot import bubble_plot
import pandas as pd
import plotly.offline as py
df = pd.read_csv('/path/to/detections.csv')
To display the plot in iPython use:
bubble_plot(df)
Or use the standard plotting function to save as HTML:
bubble_plot(df,ipython_display=False, filename='/path_to_plot.html')
You can also do your count by number of individuals by using
type = 'individual
:
bubble_plot(df, type='individual')
Mapbox¶
Alternatively you can use a Mapbox access token plot your map. Mapbox is much for responsive than standard Scattergeo plot.
Example Code¶
mapbox_access_token = 'ADD_YOUR_TOKEN_HERE'
bubble_plot(df, mapbox_token=mapbox_access_token)
Below is the sample output for blue sharks off of the coast of Nova Scotia.
Bubble Plot Function¶
-
bubble_plot.
bubble_plot
(detections, type='detections', ipython_display=True, title='Bubble Plot', height=700, width=1000, plotly_geo=None, filename=None, mapbox_token=None, marker_size=10, colorscale='Viridis')¶ Creates a plotly abacus plot from a pandas dataframe
Parameters: - detections – detection dataframe
- ipython_display – a boolean to show in a notebook
- title – the title of the plot
- height – the height of the plotl
- width – the width of the plotly
- plotly_geo – an optional dictionary to controle the geographix aspects of the plot
- filename – Plotly filename to write to
- mapbox_token – A string of mapbox access token
- marker_size – An int to indicate the diameter in pixels
- colorscale – A string to indicate the color index
Returns: A plotly geoscatter plot or mapbox plot
Cohort¶
The tool takes a dataframe of compressed detections and a time parameter in minutes. It identifies groups of animals traveling together. Each station an animal visits is checked for other animals detected there within the specified time period.
The function returns a dataframe which you can use to help identify
animal cohorts. The cohort is created from the compressed data that is a
result from the compress_detections()
function. Pass the compressed
dataframe into the cohort()
function along with a time interval in
seconds (default is 3600) to create the cohort dataframe.
Warning
- Input files must include
station
,catalognumber
, seq_num
,unqdetecid
, anddatecollected
as columns.
from resonate.cohorts import cohort
from resonate.compress import compress_detections
import pandas as pd
time_interval = 3600 # in seconds
data = pd.read_csv('/path/to/detections.csv')
compressed_df = compress_detections(data)
cohort_df = cohort(compressed_df, time_interval)
cohort_df
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
# Saves the cohort file
cohort_df.to_csv('/path/to/output.csv', index=False)
Cohort Functions¶
-
cohorts.
cohort
(compressed_df, interval_time=3600)¶ Creates a dataframe of cohorts using a compressed detection file
Parameters: - compressed_df – compressed dataframe
- interval_time – cohort detection time interval (in seconds)
Returns: cohort dataframe with the following columns
- anml_1
- anml_1_seq
- station
- anml_2
- anml_2_seq
- anml_2_arrive
- anml_2_depart
- anml_2_startunqdetecid
- anml_2_endunqdetecid
- anml_2_detcount
Compressing Detections¶
Compressing detections is done by looking at the detection times and
locations of an animal. Any detections that occur successively in time,
and the time between detections does not exceed the timefilter
, in
the same location are combined into a single detection with a start and
end time. The result is a compressed detections Pandas DataFrame.
Compression is the first step of the Mihoff Interval Data Tool. Compressed detection DataFrames are needed for the tools, such as interval and cohort.
Warning
Input files must include datecollected
, catalognumber
, and unqdetecid
as columns.
from resonate.compress import compress_detections
import pandas as pd
detections = pd.read_csv('/path/to/data.csv')
compressed = compress_detections(detections=detections)
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
compressed.to_csv('/path/to/output.csv', index=False)
Compression Functions¶
-
compress.
compress_detections
(detections, timefilter=3600)¶ Creates compressed dataframe from detection dataframe
Parameters: - detections – detection dataframe
- timefilter – A maximum amount of time in seconds that can pass before a new detction event is started
Returns: A Pandas DataFrame matrix of detections events
Filtering Detections on Distance / Time¶
White/Mihoff Filter¶
(White, E., Mihoff, M., Jones, B., Bajona, L., Halfyard, E. 2014. White-Mihoff False Filtering Tool)
OTN has developed a tool which will assist with filtering false detections. The first level of filtering involves identifying isolated detections. The original concept came from work done by Easton White. He was kind enough to share his research database with OTN. We did some preliminary research and developed a proposal for a filtering tool based on what Easton had done. This proof of concept was presented to Steve Kessel and Eddie Halfyard in December 2013 and a decision was made to develop a tool for general use.
This is a very simple tool. It will take an input file of detections and based on an input parameter will identify suspect detections. The suspect detections will be put into a dataframe which the user can examine. There will be enough information for each suspect detection for the user to understand why it was flagged. There is also enough information to be able to reference the detection in the original file if the user wants to see what was happening at the same time.
The input parameter is a time in seconds. We used 3600 seconds as the default as this is what was used in Easton’s code. This value can be changed by the user. The output contains a record for each detection for which there has been more than xx seconds since the previous detection (of that tag/animal) and more than the same amount of time until the next detection. It ignores which receiver the detection occurred at. That is all it does, nothing more and nothing less.
Below the interval is set to 3600 seconds and is not using a a user specified suspect file. The function will also create a distance matrix.
Warning
Input files must include datecollected
, catalognumber
, station
and unqdetecid
as columns.
from resonate.filters import get_distance_matrix
from resonate.filters import filter_detections
import pandas as pd
detections = pd.read_csv('/path/to/detections.csv')
time_interval = 3600 # in seconds
SuspectFile = None
CreateDistanceMatrix = True
filtered_detections = filter_detections(detections,
suspect_file=SuspectFile,
min_time_buffer=time_interval,
distance_matrix=CreateDistanceMatrix)
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
filtered_detections['filtered'].to_csv('/path/to/output.csv', index=False)
filtered_detections['suspect'].to_csv('/path/to/output.csv', index=False)
filtered_detections['dist_mtrx'].to_csv('/path/to/output.csv', index=False)
Distance Filter¶
The distance filter will separate detections based only on distance. The
maximum_distance
argument defaults to 100,000 meters (or 100
kilometers), but can be adjusted. Any detection where the succeeding and
preceding detections are more than the maximum_distance
away will be
considered suspect.
Warning
Input files must include datecollected
, catalognumber
, station
and unqdetecid
as columns.
from resonate.filters import distance_filter
import pandas as pd
detections = pd.read_csv('/path/to/detections.csv')
filtered_detections = distance_filter(detections)
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
filtered_detections['filtered'].to_csv('/path/to/output.csv', index=False)
filtered_detections['suspect'].to_csv('/path/to/output.csv', index=False)
Velocity Filter¶
The velocity filter will separate detections based on the animal’s
velocity. The maximum_velocity
argument defaults to 10 m/s, but can
be adjusted. Any detection where the succeeding and preceding velocities
of an animal are more than the maximum_velocity
will be considered
suspect.
Warning
Input files must include datecollected
, catalognumber
, station
and unqdetecid
as columns.
from resonate.filters import velocity_filter
import pandas as pd
detections = pd.read_csv('/path/to/detections.csv')
filtered_detections = velocity_filter(detections)
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
filtered_detections['filtered'].to_csv('/path/to/output.csv', index=False)
filtered_detections['suspect'].to_csv('/path/to/output.csv', index=False)
Filtering Functions¶
-
filters.
distance_filter
(detections, maximum_distance=100000)¶ Parameters: - detections – a Pandas DataFrame of acoustic detection
- maximum_distance – a umber in meters, default is 100000
Returns: A list of Pandas DataFrames of filtered detections and suspect detections
-
filters.
filter_detections
(detections, suspect_file=None, min_time_buffer=3600, distance_matrix=False)¶ Filters isolated detections that are more than min_time_buffer apart from other dets. for a series of detections in detection_file. Returns Filtered and Suspect dataframes. suspect_file can be a file of existing suspect detections to remove before filtering. dist_matrix is created as a matrix of between-station distances from stations defined in the input file.
Parameters: - detections – A Pandas DataFrame of acoustic detections
- suspect_file – Path to a user specified suspect file, same format as the detections
- min_time_buffer – The minimum of time required for outlier detections in seconds
- distance_matrix – A boolean of whether or not to generate the distance matrix
Returns: A list of Pandas DataFrames of filtered detections, suspect detections, and a distance matrix
-
filters.
get_distance_matrix
(detections)¶ Creates a distance matrix of all stations in the array or line.
Parameters: detections – a Pandas DataFrame of detections Returns: A Pandas DataFrame matrix of station to station distances
-
filters.
velocity_filter
(detections, maximum_velocity=10)¶ Parameters: - detections –
- maximum_velocity –
Returns: A list of Pandas DataFrames of filtered detections and suspect detections
Interval Data¶
interval_data()
takes a compressed detections DataFrame, a distance
matrix, and a detection radius DataFrame and creates an interval data
DataFrame.
Intervals are lengths of time in which a station detected an animal. Many consecutive detections of an animal are replaced by one interval.
Warning
Input files must include datecollected
, catalognumber
, and unqdetecid
as columns.
from resonate.filters import get_distance_matrix
from resonate.compress import compress_detections
from resonate.interval_data_tool import interval_data
import pandas as pd
import geopy
input_file = pd.read_csv("/path/to/detections.csv")
compressed = compress_detections(input_file)
matrix = get_distance_matrix(input_file)
Set the station radius for each station name.
detection_radius = 400
station_det_radius = pd.DataFrame([(x, geopy.distance.Distance(detection_radius/1000.0))
for x in matrix.columns.tolist()], columns=['station','radius'])
station_det_radius.set_index('station', inplace=True)
station_det_radius
You can modify individual stations if needed by using
DatraFrame.set_value()
from Pandas.
station_name = 'HFX001'
station_detection_radius = 500
station_det_radius.at[station_name, 'radius'] = geopy.distance.Distance( station_detection_radius/1000.0 )
Create the interval data by passing the compressed detections, the matrix, and the station radii.
interval = interval_data(compressed_df=compressed, dist_matrix_df=matrix, station_radius_df=station_det_radius)
interval
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
interval.to_csv('/path/to/output.csv', index=False)
Interval Data Functions¶
-
interval_data_tool.
interval_data
(compressed_df, dist_matrix_df, station_radius_df=None)¶ Creates a detection interval file from a compressed detection, distance matrix and station detection radius DataFrames
Parameters: - compressed_df – compressed detection dataframe
- dist_matrix_df – station distance matrix dataframe
- station_radius – station distance radius dataframe
Returns: interval detection Dataframe
Residence Index¶
Kessel et al. Paper https://www.researchgate.net/publication/279269147
This residence index tool will take a compressed or uncompressed detection file and caculate the residency index for each station/receiver in the detections. A CSV file will be written to the data directory for future use. A Pandas DataFrame is returned from the function, which can be used to plot the information. The information passed to the function is what is used to calculate the residence index, make sure you are only passing the data you want taken into consideration for the residence index (i.e. species, stations, tags, etc.).
detections: The CSV file in the data directory that is either compressed or raw. If the file is not compressed please allow the program time to compress the file and add the rows to the database. A compressed file will be created in the data directory. Use the compressed file for any future runs of the residence index function.
calculation_method: The method used to calculate the residence index. Methods are:
- kessel
- timedelta
- aggregate_with_overlap
- aggregate_no_overlap.
project_bounds: North, South, East, and West bounding longitudes and latitudes for visualization.
The calculation methods are listed and described below before they are called. The function will default to the Kessel method when nothing is passed.
Below is an example of inital variables to set up, which are the detection file and the project bounds.
Warning
Input files must include datecollected
, station
, longitude
,
latitude
, catalognumber
, and unqdetecid
as columns.
from resonate import residence_index as ri
import pandas as pd
detections = pd.read_csv('/Path/to/detections.csv')
Kessel Residence Index Calculation¶
The Kessel method converts both the startdate and enddate columns into a date with no hours, minutes, or seconds. Next it creates a list of the unique days where a detection was seen. The size of the list is returned as the total number of days as an integer. This calculation is used to determine the total number of distinct days (T) and the total number of distinct days per station (S).
\(RI = \frac{S}{T}\)
RI = Residence Index
S = Distinct number of days detected at the station
T = Distinct number of days detected anywhere on the array
Warning
Possible rounding error may occur as a detection on 2016-01-01 23:59:59
and a detection on 2016-01-02 00:00:01
would be counted as two days when it is really 2-3 seconds.
Kessel RI Example Code¶
kessel_ri = ri.residency_index(detections, calculation_method='kessel')
ri.plot_ri(kessel_ri)
Timedelta Residence Index Calculation¶
The Timedelta calculation method determines the first startdate of all detections and the last enddate of all detections. The time difference is then taken as the values to be used in calculating the residence index. The timedelta for each station is divided by the timedelta of the array to determine the residence index.
\(RI = \frac{\Delta S}{\Delta T}\)
RI = Residence Index
\(\Delta S\) = Last detection time at a station - First detection time at the station
\(\Delta T\) = Last detection time on an array - First detection time on the array
Timedelta RI Example Code¶
timedelta_ri = ri.residency_index(detections, calculation_method='timedelta')
ri.plot_ri(timedelta_ri)
Aggregate With Overlap Residence Index Calculation¶
The Aggregate With Overlap calculation method takes the length of time of each detection and sums them together. A total is returned. The sum for each station is then divided by the sum of the array to determine the residence index.
RI = \(\frac{AwOS}{AwOT}\)
RI = Residence Index
AwOS = Sum of length of time of each detection at the station
AwOT = Sum of length of time of each detection on the array
Aggregate With Overlap RI Example Code¶
with_overlap_ri = ri.residency_index(detections, calculation_method='aggregate_with_overlap')
ri.plot_ri(with_overlap_ri)
Aggregate No Overlap Residence Index Calculation¶
The Aggregate No Overlap calculation method takes the length of time of each detection and sums them together. However, any overlap in time between one or more detections is excluded from the sum.
For example, if the first detection is from 2016-01-01 01:02:43 to 2016-01-01 01:10:12 and the second detection is from 2016-01-01 01:09:01 to 2016-01-01 01:12:43, then the sume of those two detections would be 10 minutes.
A total is returned once all detections of been added without overlap. The sum for each station is then divided by the sum of the array to determine the residence index.
RI = \(\frac{AnOS}{AnOT}\)
RI = Residence Index
AnOS = Sum of length of time of each detection at the station, excluding any overlap
AnOT = Sum of length of time of each detection on the array, excluding any overlap
Aggregate No Overlap RI Example Code¶
no_overlap_ri = ri.residency_index(detections, calculation_method='aggregate_no_overlap')
ri.plot_ri(no_overlap_ri, title="ANO RI")
Mapbox¶
Alternatively you can use a Mapbox access token plot your map. Mapbox is much for responsive than standard Scattergeo plot.
Mapbox Example Code¶
mapbox_access_token = 'YOUR MAPBOX ACCESS TOKEN HERE'
kessel_ri = ri.residency_index(detections, calculation_method='kessel')
ri.plot_ri(kessel_ri, mapbox_token=mapbox_access_token,marker_size=40, scale_markers=True)
Residence Index Functions¶
-
residence_index.
aggregate_total_no_overlap
(detections)¶ The function below aggregates timedelta of startdate and enddate, excluding overlap between detections. Any overlap between two detections is converted to a new detection using the earlier startdate and the latest enddate. If the startdate and enddate are the same, a timedelta of one second is assumed.
Parameters: detections – pandas DataFrame pulled from the compressed detections CSV Returns: An float in the number of days
-
residence_index.
aggregate_total_with_overlap
(detections)¶ The function below aggregates timedelta of startdate and enddate of each detection into a final timedelta then returns a float of the number of days. If the startdate and enddate are the same, a timedelta of one second is assumed.
Parameters: detections – Pandas DataFrame pulled from the compressed detections CSV Returns: An float in the number of days
-
residence_index.
get_days
(dets, calculation_method='kessel')¶ Determines which calculation method to use for the residency index.
Wrapper method for the calulation methods above.
Parameters: - dets – A Pandas DataFrame pulled from the compressed detections CSV
- calculation_method – determines which method above will be used to count total time and station time
Returns: An int in the number of days
-
residence_index.
get_station_location
(station, detections)¶ Returns the longitude and latitude of a station/receiver given the station and the table name.
Parameters: - station – String that contains the station name
- detections – the table name in which to find the station
Returns: A Pandas DataFrame of station, latitude, and longitude
-
residence_index.
plot_ri
(ri_data, ipython_display=True, title='Bubble Plot', height=700, width=1000, plotly_geo=None, filename=None, marker_size=6, scale_markers=False, colorscale='Viridis', mapbox_token=None)¶ Parameters: - ri_data – A Pandas DataFrame generated from
residency_index()
- ipython_display – a boolean to show in a notebook
- title – the title of the plot
- height – the height of the plotly
- width – the width of the plotly
- plotly_geo – an optional dictionary to control the geographic aspects of the plot
- filename – Plotly filename to write to
- mapbox_token – A string of mapbox access token
- marker_size – An int to indicate the diameter in pixels
- scale_markers – A boolean to indicate whether or not markers are scaled by their value
- colorscale – A string to indicate the color index. See here for options: https://community.plot.ly/t/what-colorscales-are-available-in-plotly-and-which-are-the-default/2079
Returns: A plotly geoscatter
- ri_data – A Pandas DataFrame generated from
-
residence_index.
residency_index
(detections, calculation_method='kessel')¶ This function takes in a detections CSV and determines the residency index for reach station.
Residence Index (RI) was calculated as the number of days an individual fish was detected at each receiver station divided by the total number of days the fish was detected anywhere on the acoustic array. - Kessel et al.
Parameters: - detections – CSV Path
- calculation_method – determines which method above will be used to count total time and station time
Returns: A residence index DataFrame with the following columns
- days_detected
- latitude
- longitude
- residency_index
- station
-
residence_index.
total_days_count
(detections)¶ The function below takes a Pandas DataFrame and determines the number of days any detections were seen on the array.
The function converst both the startdate and enddate columns into a date with no hours, minutes, or seconds. Next it creates a list of the unique days where a detection was seen. The size of the list is returned as the total number of days as an integer.
* NOTE ** Possible rounding error may occur as a detection on 2016-01-01 23:59:59 and a detection on 2016-01-02 00:00:01 would be counted as days when it is really 2-3 seconds.
Parameters: detections – Pandas DataFrame pulled from the compressed detections CSV Returns: An int in the number of days
-
residence_index.
total_days_diff
(detections)¶ Determines the total days difference.
The difference is determined by the minimal startdate of every detection and the maximum enddate of every detection. Both are converted into a datetime then subtracted to get a timedelta. The timedelta is converted to seconds and divided by the number of seconds in a day (86400). The function returns a floating point number of days (i.e. 503.76834).
Parameters: detections – Pandas DataFrame pulled from the compressed detections CSV Returns: An float in the number of days
Receiver Efficiency Index¶
The receiver efficiency index is number between 0
and 1
indicating the amount of relative activity at each receiver compared to
the entire set of receivers, regardless of positioning. The function
takes a set detections and a deployment history of the receivers to
create a context for the detections. Both the amount of unique tags and
number of species are taken into consideration in the calculation.
The receiver efficiency index implement is implemented based on the paper Acoustic telemetry array evolution: From species- and project-specific designs to large-scale, multispecies, cooperative networks. Each receiver’s index is calculated on the formula of:
REI = \(\frac{T_r}{T_a} \times \frac{S_r}{S_a} \times \frac{DD_r}{DD_a} \times \frac{D_a}{D_r}\)
- REI = Receiver Efficiency Index
- \(T_r\) = The number of tags detected on the receievr
- \(T_a\) = The number of tags detected across all receivers
- \(S_r\) = The number of species detected on the receiver
- \(S_a\) = The number of species detected across all receivers
- \(DD_a\) = The number of unique days with detections across all receivers
- \(DD_r\) = The number of unique days with detections on the receiver
- \(D_a\) = The number of days the array was active
- \(D_r\) = The number of days the receiver was active
Each REI is then normalized against the sum of all considered stations.
The result is a number between 0
and 1
indicating the relative
amount of activity at each receiver.
Warning
Detection input files must include datecollected
, fieldnumber
, station
, and scientificname
as columns and deployment input files must include station_name
, deploy_date
, last_download
, and recovery_date
as columns.
REI()
takes two arguments. The first is a dataframe of detections
the detection timstamp, the station identifier, the species, and the tag
identifier. The next is a dataframe of deployments for each station. The
station name should match the stations in the detections. The
deployments need to include a deployment date and recovery date or last
download date. Details on the columns metnioned see the preparing data
section.
Warning
This function assumes that no deployments for single station overlap. If deployments do overlap, the overlapping days will be counted twice.
from resonate.receiver_efficiency import REI
detections = pd.read_csv('/path/to/detections.csv')
deployments = pd.read_csv('/path/to/deployments.csv')
station_REIs = REI(detections = detections, deployments = deployments)
Residence Index Functions¶
-
receiver_efficiency.
REI
(detections, deployments)¶ Calculates a returns a list of each station and the REI (defined here):
Parameters: - detections – a pandas DataFrame of detections
- deployments – a pandas DataFrame of station deployment histories
Returns: a pandas DataFrame of station, REI, latitude, and longitude
Subsetting Data¶
Sometimes there is too much data for a visualization tool to handle, or you wish to only take a certain subset of your input data and apply it elsewhere.
These examples, written in Python and leveraging the Pandas data manipulation package, are meant as a starting point. More complex operations are possible in Pandas, but these should form a baseline of understanding that will cover the most common operations.
import pandas as pd
filename = "/path/to/data.csv"
data = pd.read_csv(directory+filename)
Subsetting data by date range¶
Provide a date field, as well as starting and ending date range. By default, the detection date column of a detection extract file is provided.
# Enter the column name that contains the date you wish to evaluate
datecol = 'datecollected'
# Enter the start date in the following format
startdate = "YYYY-MM-DD"
# Enter the end date in the following format
enddate = "YYYY-MM-DD"
# Subsets the dat between the two indicated dates uding the datecollected column
data_date_subset = data[(data[datecol] > startdate) & (data[datecol] < enddate)]
# Output the subset data to a new CSV in the indicated directory
data_date_subset.to_csv(directory+startdate+"_to_"+enddate+"_"+filename, index=False)
Subsetting on column value¶
Provide the column you expect to have a certain value and the value you’d like to create a subset from.
# Enter the column you want to subset
column=''
# Enter the value you want to find in the above column
value=''
# The following pulls the new data subset into a Pandas DataFrame
data_column_subset=data[data[column]==value]
# Output the subset data to a new CSV in the indicated directory
data_column_subset.to_csv(directory+column+"_"+value.replace(" ", "_")+"_"+filename, index=False)
Unique Detections ID¶
Adds the uniquecid column to your input file. The uniquecid column assigns every detection record a unique numerical value. This column is needed in order to perform operations, such as filter and compression functions.
The code below will add a unique detection ID column and return the Pandas dataframe.
from resonate.uniqueid import add_unqdetecid
input_file = '/path/to/detections.csv'
unqdet_det = add_unqdetecid(input_file);
You can use the Pandas DataFrame.to_csv()
function to output the
file to a desired location.
unqdet_det.to_csv('/path/to/output.csv', index=False)
Unique Detections Function¶
-
uniqueid.
add_unqdetecid
(input_file, encoding='utf-8-sig')¶ Adds the unqdetecid column to an input csv file. The resulting file is returned as a pandas DataFrame object.
Parameters: - input_file – Path to the input csv file.
- encoding – source encoding for the input file (Default utf8-bom)
Returns: padnas DataFrame including unqdetecid column.
Visual Timeline¶
This tool takes a detections extract file and generates a Plotly animated timeline, either in place in an iPython notebook or exported out to an HTML file.
Warning
Input files must include datecollected
, catalognumber
, station
, latitude
, and longitude
as columns.
from resonate.visual_timeline import timeline
import pandas as pd
detections = pd.read_csv("/path/to/detection.csv")
timeline(detections, "Timeline")
Exporting to an HTML File¶
You can export the map to an HTML file by setting ipython_display
to
False
.
from resonate.visual_timeline import timeline
import pandas as pd
detections = pd.read_csv("/path/to/detection.csv")
timeline(detections, "Timeline", ipython_display=False)
Mapbox¶
Alternatively you can use a Mapbox access token plot your map. Mapbox is much for responsive than standard Scattergeo plot.
from resonate.visual_timeline import timeline
import pandas as pd
mapbox_access_token = 'YOUR MAPBOX ACCESS TOKEN HERE'
detections = pd.read_csv("/path/to/detection.csv")
timeline(detections, "Title", mapbox_token=mapbox_access_token)
Example Output¶
Below is the sample output for blue sharks off of the coast of Nova Scotia, without using Mapbox.