Difference between revisions of "App/specimen/batch load"

From mx Help Wiki
Jump to: navigation, search
m (Summary steps)
Line 2: Line 2:
  
 
The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format.  Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.
 
The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format.  Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.
 +
 +
'''Important:  Most records require the use of an Identifier, which in turn needs a [[App/namespace|Namespace]].'''
  
 
== Summary steps ==
 
== Summary steps ==

Revision as of 10:51, 2 February 2012

Contents

Overview

The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format. Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.

Important: Most records require the use of an Identifier, which in turn needs a Namespace.

Summary steps

  1. Assemble the import file.
    • Each row must contain a header BUT not headers need be matches to those possible headers listed below. If they do not match they are ignored!
    • The first row must contain the headers for each row. Only those columns that match the list below will be read. Columns with headers not matching are ignored.
    • Each row represents a single specimen or lot.
    • Each row must match an existing OTU already in mx. Matching of row data to the existing is possible via various mechanisms (see table below).
      • One possible mechanism for quickly adding new OTUs that don't exist is to use Google Refine on an 'otu_name' column.
        1. Create a new project in Google refine by selecting the text file that you will later import.
        2. Click on the arrow to the right of the 'otu_name' column and select 'Facet'->'Text facet'. A window to the left will highlight all unique names (as an side this is very useful for debugging your data, see the Google Refine tutorials).
        3. Click '123 choices" in the facet box, this will bring up a text file that you can copy/paste to a new text file.
        4. Modify that file to include only 1 OTU name / line.
        5. Navigate to OTU->'more options'->'batch load' and load your unique OTU names.
  2. Once the file is ready for testing follow the batch loading instructions in mx.

Saving from Excel

If you work in Excel to capture your data you'll be exporting it prior to loading it. Your data should be saved in one of the following formats, (topmost preferred).

File format Character Encoding
Tab Delimited Text (.txt) MACINTOSH or ISO-8859-1
Windows Formatted Text (.txt) ISO-8859-1
UTF-16 Unicode Text (.txt) UTF-16LE

Columns

Required for both lots and specimens

     otu_id

or (both allowed if they match)

     otu_name 

OR

     taxon_name_id

or (both allowed if they match)

     taxon_name_string

Required for Lots

     count

Required for Specimens

     identifier 

Optional for Specimens and Lots

One of

     ce_id
     verbatim_label

One of

     repository_id
     repository_coden
    

Any of

     sex
     stage
     notes
     data_entry_by

Optional for lots

 identifier

Optional for specimens

An additional determination

One of

     det_name
     det_otu_id 

and any of

     determiner
     det_year
     det_basis

A type designation

both of

     type
     type_of_taxon_name_id

Valid column heads on batch files

column headers description restrictions
ce_id A mx collecting event id. If present in a row then verbatim_label must be empty.
count Total specimens in a lot. If present in a row then identifier must be empty.
data_entry_by A mx uses login. If supplied then all creator/updator fields are set to this person. If not supplied the creator is the person logged in. None.
det_basis Free text describing how the specimen. determination was made. Requires det_name or 'det_otu_id' to be included. Not applicable to lots.
det_name Free text determination of specimen. If present in a row then det_otu_id must be empty. Not applicable to lots.
det_otu_id An id of an mx OTU that the specimen is determined as. If present in a row then det_name must be empty. Not applicable to lots.
det_year A year in which a determination is made. Requires det_otu_id or det_name to be present. Not applicable to lots.
determiner Free text indicator of determiner. Requires det_otu_id or det_name to be present. Not applicable to lots.
identifier An identifier in the form of "namespace identifier", like "NCSU 1234". The namespace must be present in mx. Applicable to lots or specimens. Identifiers must be unique with an mx project (i.e. lots and specimens can not share identifiers).
notes Free text notes. None.
otu_id A mx OTU id to which a specimen or lot is primarily attached. Must match an OTU found by otu_name if otu_name is also provided.
otu_name A mx OTU name (not associated taxon name, manuscript name, nor matrix name) which identifies the specimen. The named OTU must exist in the database. Must match the OTU indicated by otu_id if otu_id is also provide. If otu_id is not provided must be unique for the given project.
repository_coden A mx repository coden, like "NCSU". Must match a coden in repositories. If provided repository_id must be empty.
repository_id A mx id matching a respository. Must match a coden in repositories. If provided repository_coden must be empty.
sex Free text determination of sex, standard use is "male" or "female". Applicable to lots or specimens. None.
stage Free text determination of stage. Applicable to lots or specimens. None.
taxon_name_id Mx taxon name id which identifiers a specimen. Must be tied to a single OTU. Applicable to specimens only. Must be empty if otu_id or otu_name is provided. Must match taxon_name_string if taxon_name_string is provided.
taxon_name_string Mx taxon name which identifiers a specimen. Name in mx must be unique, and associated id must be tied to a single OTU. Applicable to specimens only. Must be empty if otu_id or otu_name is provided. Must match taxon_name_id if taxon_name_string is provided. At present matches only single word string.
type The code recognized type status of the specimen. Like "holotype", "paratype", "syntype", "neotype", or "paralectotype". Not for non governed things like "allotype". Applicable to specimens only. Must be used in combination with type_of_taxon_name_id.
type_of_taxon_name_id The mx taxon name id that the type refers to. Applicable to specimens only. Must be used in combination with type.
verbatim_label The verbatim text of the label(s) of the specimen or lot. Identical verbatim_labels are tied to a single collecting event in mx. Individual lines can be separated with "||", individual labels can be separated with "++". "||" and '++' will be replaced with one and two new lines respectively. Must be empty if ce_id is provided.
latitude A real number in decimal degrees format. Must be empty if ce_id is provided.
longitude A real number in decimal degrees format. Must be empty if ce_id is provided.
elev_min A real number indicating the minimum elevation of the elevation range. Must be empty if ce_id is provided. Required if elev_max provided.
elev_max A real number indicating the maximum elevation of the elevation range. Must be empty if ce_id is provided.
elev_unit "feet" or "meters". Must be empty if ce_id is provided. Required if elev_min included.
geog_id An mx id for a geographic unit that indicates the finest geographic unit for the collecting event. Must be empty if ce_id is provided.
method Interpreted version of the method given on the label (not verbatim). Must be empty if ce_id is provided.
locality Free text description of the locality, e.g. "1.5 km N Calgary" Must be empty if ce_id is provided.
sd_d Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
sd_m Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
sd_y Verbatim, max 4 character integer. Must be empty if ce_id is provided.
ed_d Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
ed_m Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
ed_y Verbatim, max 4 character integer. Must be empty if ce_id is provided.
time_start Hours:Minutes, like "13:23". 24 hour clock. Must be empty if ce_id is provided.
time_end Hours:Minutes, like "13:23". 24 hour clock. Must be empty if ce_id is provided.
collectors Verbatim. Must be empty if ce_id is provided.
micro_habitat Verbatim. Must be empty if ce_id is provided.
macro_habitat Verbatim. Must be empty if ce_id is provided.
Personal tools