\chapter{Creating Detectlets} \label{CreatingDetectlets}

Detectlets are one of the most exciting parts of the Picalo framework.  Detectlets allow you to codify your work in an established, robust framework, and they allow others to benefit from your genius.  Detectlets give you a way to contribute back a little bit to a community that has worked hard to provide this tool.

You are even free to charge for the use of your Detectlets, should you choose to do so.  I expect that some users will give their Detectlets away for free and others will sell them commercially.  Either way, you'll benefit the community and make us all more productive.

First and foremost, the goal of a Detectlet is to walk an unskilled user through your analysis algorithm.  Deteclets should be simple, simple, simple.  They should contain verbose wording to explain why the Detectlet does what it does, where it should be used, its background, the assumptions it makes, and as much other information as you can give.  Try to put yourself in the place of a person learning your algorithm for the first time.  Don't talk down to the user, but certainly try to be descriptive.  Help the user learn what you know (or at least, help the user learn how to utilize what you know).

Detectlets should be written to the domain rather than to the abstract analysis being done.  Remember that many users won't see the potential of many basic routines.  Picalo already does many abstract analysis techniques, such as stratification, regular expressions, loading data, and so forth. Detectlets should be specific to a single use.

For example, comparing columns from two tables is a basic analysis routine.  This analysis can be used to compare employee addresses to vendor addresses, compare student assignments to one another, or do a hundred other things.  Each of these things should turn into a Detectlet, even though the analysis is essentially the same.  In other words, do not create a Detectlet named ``Compare Columns From Tables''; create several Detectlets named ``Find Phantom Vendors By Comparing Employee Addresses to Vendor Addresses'', ``Discover Potential Cheating By Comparing Student Assignments With On Another'', and so forth.

Detectlets assume that the user has already loaded one or more tables into memory.  Each Detectlet should take these table(s) as input.  After the analysis is completed, each Deteclet should return a Table or TableList containing the results.

\section{The Detectlet Process}

Each Detectlet starts with a description of its purpose, domain, background, and assumptions.  The opening page of the wizard also (hopefully) shows the user example data.  Most users want to see example data to know what their input data should look like.

The next few pages of the wizard collect input data, settings, and user preferences for the analysis.  The user should be able to select the table(s) to be analyzed, the individual columns of interest, and any other settings important to the analysis.   You do not need to expose every nuance of your analysis to the user.  Instead, you may want to make a few assumptions or create multiple Detectlets with the same routine but with different settings.  Most users will understand multiple Detectlets better than a single Detectlet with lots of settings.

The input data should generally be in Picalo Table or TableList format.  It is important that Picalo routines always input tables and output tables.  This allows routines to be chained together to create even more complex and comprehensive routines.

The input pages of the wizard are defined by your Detectlet XML.  When all data is collected (according to your Detectlet XML), the wizard will present a final page asking the user for a variable name to store the results in.

When the user clicks the \textit{Finish} button, Picalo will call your routine with the data inputed by the user.  If you routine throws an exception, the user will be notified and will be allowed to adjust the input settings.

When your analysis finishes successfully, the results table (returned by your routine) is displayed in Picalo.  A popup window displays on top of this window to teach the user how to interpret the results table.  The user is free to close this window or study its contents as well as the table.

\section{Detectlet Anatomy}

A Detectlet is a Python file with a special structure.  To install a detectlet, simply place it in the ``detectlets'' directory (or one of its subdirectories) -- Picalo can do this for you if you select Detectlets | Install Detectlet Library from the menu. Place only one Detectlet per source file.

The name of the Detectlet file is used to populate the Detectlet's menu in Picalo.  Use capital letters to signify the words of your Detectlet.  A Detectlet named ``Find Phantom Vendors By Comparing Employee Addresses to Vendor Addresses'' should be contained in a file named ``FindPhantomVendorsByComparingEmployeeAddressestoVendorAddresses.py''.

Each Detectlet file has four members: the \texttt{version}, the \texttt{wizard} variable, the \texttt{run} analysis function, and the \texttt{example\_input} function.  These members are described in the next subsections.

If you develop your Detectlet in the Picalo editor, you can test run your Detectlet by selecting Script | Run Script As Detectlet.  This will run the Detectlet directly from the editor (without direct installation into Picalo) so you can debug and test it.

Picalo reloads Detectlets \textit{every} time they are run, so you don't need to restart Picalo every time you modify one during development.  Each subsequent run of the wizard will reload any new changes from your files.

Since Picalo comes with several example Detectlets, be sure to open these files in the editor.  These files give concrete examples of the requirements set forth in the next sections.

\section{Detectlet Version}
As the standards may change over time in the way detectlets run, you must provide a ``DETECTLET\_STANDARD'' global variable within your detectlet file.  This lets Picalo know what standard your detectlet conforms to.

The current standard is version 1.0.  Be sure to include the following code near the top of your detectlet file:
\begin{picalo}
DETECTLET_STANDARD = 1.0
\end{picalo}

\section{The wizard variable}
The \texttt{wizard} variable is a multiline string that contains a short XML document.  The purpose of the XML document is to outline the pages of the wizard.  The purpose of the wizard is to gather information from the user about the parameters required to run the function.  

The XML document has the following format:
\begin{picalo}
<detectlet>
  <page>
    Instruction text for the user can go anywhere within the page tag.
    <parameter type="parameter type" variable="function parameter name"/>
    ... (other parameters for this page)
  </page>
  <page>
    ... (additional parameters for the next page)
  </page>
  ... (additional pages for the wizard)
</detectlet>
\end{picalo}

As seen in the above XML, the document tells Picalo how many pages to show the user and what parameters to ask for on each page.   It also contains instruction text so the user can understand what is needed.  The following parameter types are supported:
\paragraph{Integer Numbers}
The integer number parameter asks the user to input an integer.  The control supports minimum and maximum enforcement, a default value (shown when the control is first displayed), and a regular-expression-based mask.  
\begin{picalo}
<parameter type="int" 
           variable="varname" 
           min="0" 
           max="5" 
           default="1" 
           mask="regex"/>
\end{picalo}

\paragraph{Floating-Point Numbers}
The floating-point number parameter asks the user to input a float.  The control supports minimum and maximum enforcement, a default value (shown when the control is first displayed), and a regular-expression-based mask.  
\begin{picalo}
<parameter type="float" 
           variable="varname" 
           min="0.0" 
           max="5.0" 
           default="1.0" 
           mask="regex"/>
\end{picalo}

\paragraph{Strings}
The string parameter asks the user to input a string of any type or length.  The control a regular-expression-based mask, which gives total control over what the user enters into the control.
\begin{picalo}
<parameter type="string" 
           variable="varname" 
           mask="regex"/>
\end{picalo}


\paragraph{Tables}
The table parameter shows the user the available tables in memory (previously created or loaded into Picalo).  The table parameter supports an optional "multiple" attribute that allows multiple tables to be selected.  The table parameter returns an actual reference to a table in memory or a list of tables in memory if multiple is true.
\begin{picalo}
<parameter type="table" 
           variable="varname" 
           multiple="true"/>
\end{picalo}

\paragraph{Columns}
The column parameter allows the user to select a column from a table in memory.  The column parameter type must be linked to a table parameter shown elsewhere in the wizard.  Whenever the table parameter is changed, this parameter type will reload the list box with the columns from that table.  The column parameter supports an optional "multiple" attribute that allows multiple columns to be selected.  The column parameter returns the name of the selected column (as a string) or a list of strings if multiple is true.
\begin{picalo}
<parameter type="column" 
           variable="varname" 
           table="tablevarname" 
           multiple="true"/>
\end{picalo}

\paragraph{List Boxes}
The list parameter shows a list box of options for the user to select from.  One or more ``option'' child elements should be provided to give the lists available values.  The optional ``value'' attribute allows options to display text one way and set the parameter value another (such as displaying ``One'' but sending ``1'' into the function).  The choice parameter supports an optional "multiple" attribute that allows multiple choices to be selected.  The list parameter always returns a string value (which you should convert to another type if needed).
\begin{picalo}
<parameter type="list" variable="varname" multiple="true">
  <option value="value">text</option>
  ... (more options)
<parameter>
\end{picalo}

\paragraph{Choice Boxes}
The choice parameter shows a drop-down choice box of options for the user to select from.  It does not support multiple selection of options.  Otherwise, it is identical to the list parameter type.
\begin{picalo}
<parameter type="choice" variable="varname">
  <option value="value">text</option>
  ... (more options)
<parameter>
\end{picalo}

\section{Analysis Function}
The analysis function is the heart of your routine.  Once the wizard gathers all the input data, Picalo will call your analysis function.  Alternatively, your analysis function should support script-based use.  

The analysis function must be named \texttt{run()} and can take any number of parameters.  Normally, the \texttt{run} function takes one or two tables, one or two columns within those tables, and a few other settings.   

The parameters of the function should be named exactly as they are given in the ``variable'' attribute of the wizard XML.  Picalo will match these names when it calls your function.  In other words, the wizard \texttt{parameter} elements must exactly match the parameters in your \texttt{run} function.

Parameters are typed as they come into your function.  If the user selects a table in the wizard, the actual table object is sent to your script.  If the user selects a column, the column name (as a string) is sent.  Integers, floats, choices, and other input variables are sent via their types.

The function should contain documentation text (a string immediately following the function declaration) that describes the background of the routine, the input data, and the steps the analysis will perform.  This text will be shown on the first page of the wizard.  This is probably the most important part of the Detectlet because it explains where, how, and why your routine should be run.

The \texttt{run} function must return two items: A results table or table list, and a string.    Upon completion of your function, Picalo will show your results table.  If the string is nonempty, it will open a small frame containing the string.  The purpose of the string is to give follow-up information to the user on how to interpret the results you present.  You can have a standard string that returns from every call to your Detectlet, or you can custom build the string based upon the results your analysis finds.

Following is a (relatively simple) example function:
\begin{picalo}
def run(table, col):
  '''This Detectlet retrieves a column from a table'''
  results = Table([col])
  for row in table:
    results.append(row[col])
  return results, 'The displayed table contains only the column you selected.'
\end{picalo}

The content of the function is entirely up to you.  Perform an analysis using Python and/or Picalo functions.  If your function throws exceptions, the text of those exceptions will be shown to the user nicely (e.g. the program won't crash).  You can use Python's \texttt{assert} statement to ensure the parameters came in correctly.

\section{Example Input}
The \texttt{example\_input} function takes no parameters and returns either a Picalo Table or TableList containing example data.  This gives the user an example of what the input data table should look like.  You should also provide text that describes this input data in the function description.

The \texttt{example\_input} function is optional.  If it is not provided, the wizard won't have a button to open the example window.

