Difference between revisions of "App/specimen/batch load"

From mx Help Wiki
Jump to: navigation, search
m (Summary steps)
(Valid column heads on batch files)
 
(5 intermediate revisions by one user not shown)
Line 2: Line 2:
  
 
The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format.  Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.
 
The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format.  Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.
 +
 +
'''Important:  Most records require the use of an Identifier, which in turn needs a [[App/namespace|Namespace]].'''
  
 
== Summary steps ==
 
== Summary steps ==
  
 
# Assemble the import file.
 
# Assemble the import file.
 +
#* Each row must contain a header BUT not headers need be matches to those possible headers listed below.  If they do not match they are ignored!
 
#* The first row must contain the headers for each row.  Only those columns that match the list below will be read.  Columns with headers not matching are ignored.
 
#* The first row must contain the headers for each row.  Only those columns that match the list below will be read.  Columns with headers not matching are ignored.
 
#* Each row represents a single specimen or lot.
 
#* Each row represents a single specimen or lot.
Line 15: Line 18:
 
#**# Modify that file to include only 1 OTU name / line.   
 
#**# Modify that file to include only 1 OTU name / line.   
 
#**# Navigate to OTU->'more options'->'batch load' and load your unique OTU names.
 
#**# Navigate to OTU->'more options'->'batch load' and load your unique OTU names.
#* Once the file is ready for testing follow the batch loading instructions in mx.
+
# Once the file is ready for testing follow the batch loading instructions in mx.
  
 
== Saving from Excel ==
 
== Saving from Excel ==
Line 140: Line 143:
 
|identifier
 
|identifier
 
|An identifier in the form of "namespace identifier", like "NCSU 1234".  
 
|An identifier in the form of "namespace identifier", like "NCSU 1234".  
|The namespace must be present in mx.  Applicable to lots or specimens.  Must be unique within given category (lots/specimens).
+
|The namespace must be present in mx.  Applicable to lots or specimens.  Identifiers must be unique with an mx project (i.e. lots and specimens can not share identifiers).
 
|-
 
|-
 
|notes
 
|notes
Line 244: Line 247:
 
|ed_y
 
|ed_y
 
|Verbatim, max 4 character integer.
 
|Verbatim, max 4 character integer.
 +
|Must be empty if ce_id is provided.
 +
|-
 +
|time_start
 +
|Hours:Minutes, like "13:23". 24 hour clock.
 +
|Must be empty if ce_id is provided.
 +
|-
 +
|time_end
 +
|Hours:Minutes, like "13:23". 24 hour clock.
 
|Must be empty if ce_id is provided.
 
|Must be empty if ce_id is provided.
 
|-
 
|-
Line 257: Line 268:
 
|Verbatim.
 
|Verbatim.
 
|Must be empty if ce_id is provided.         
 
|Must be empty if ce_id is provided.         
 +
|-
 +
|country
 +
|As matched in the geogs table.
 +
|Must be empty if ce_id is provided. Used to match against the geog table. 
 +
|-
 +
|state
 +
|As matched in the geogs table.
 +
|Must be empty if ce_id is provided. Used to match against the geog table.
 +
|-
 +
|county
 +
|As matched in the geogs table.
 +
|Must be empty if ce_id is provided. Used to match against the geog table. 
 
|}
 
|}

Latest revision as of 15:53, 15 February 2012

Contents

[edit] Overview

The batch loading functionality found here allows you to add specimen OR lot records from a spreadsheet format. Data are typically compiled in excel or google docs then exported to a tab delimited format which is read by the importer.

Important: Most records require the use of an Identifier, which in turn needs a Namespace.

[edit] Summary steps

  1. Assemble the import file.
    • Each row must contain a header BUT not headers need be matches to those possible headers listed below. If they do not match they are ignored!
    • The first row must contain the headers for each row. Only those columns that match the list below will be read. Columns with headers not matching are ignored.
    • Each row represents a single specimen or lot.
    • Each row must match an existing OTU already in mx. Matching of row data to the existing is possible via various mechanisms (see table below).
      • One possible mechanism for quickly adding new OTUs that don't exist is to use Google Refine on an 'otu_name' column.
        1. Create a new project in Google refine by selecting the text file that you will later import.
        2. Click on the arrow to the right of the 'otu_name' column and select 'Facet'->'Text facet'. A window to the left will highlight all unique names (as an side this is very useful for debugging your data, see the Google Refine tutorials).
        3. Click '123 choices" in the facet box, this will bring up a text file that you can copy/paste to a new text file.
        4. Modify that file to include only 1 OTU name / line.
        5. Navigate to OTU->'more options'->'batch load' and load your unique OTU names.
  2. Once the file is ready for testing follow the batch loading instructions in mx.

[edit] Saving from Excel

If you work in Excel to capture your data you'll be exporting it prior to loading it. Your data should be saved in one of the following formats, (topmost preferred).

File format Character Encoding
Tab Delimited Text (.txt) MACINTOSH or ISO-8859-1
Windows Formatted Text (.txt) ISO-8859-1
UTF-16 Unicode Text (.txt) UTF-16LE

[edit] Columns

[edit] Required for both lots and specimens

     otu_id

or (both allowed if they match)

     otu_name 

OR

     taxon_name_id

or (both allowed if they match)

     taxon_name_string

[edit] Required for Lots

     count

[edit] Required for Specimens

     identifier 

[edit] Optional for Specimens and Lots

One of

     ce_id
     verbatim_label

One of

     repository_id
     repository_coden
    

Any of

     sex
     stage
     notes
     data_entry_by

[edit] Optional for lots

 identifier

[edit] Optional for specimens

An additional determination

One of

     det_name
     det_otu_id 

and any of

     determiner
     det_year
     det_basis

A type designation

both of

     type
     type_of_taxon_name_id

[edit] Valid column heads on batch files

column headers description restrictions
ce_id A mx collecting event id. If present in a row then verbatim_label must be empty.
count Total specimens in a lot. If present in a row then identifier must be empty.
data_entry_by A mx uses login. If supplied then all creator/updator fields are set to this person. If not supplied the creator is the person logged in. None.
det_basis Free text describing how the specimen. determination was made. Requires det_name or 'det_otu_id' to be included. Not applicable to lots.
det_name Free text determination of specimen. If present in a row then det_otu_id must be empty. Not applicable to lots.
det_otu_id An id of an mx OTU that the specimen is determined as. If present in a row then det_name must be empty. Not applicable to lots.
det_year A year in which a determination is made. Requires det_otu_id or det_name to be present. Not applicable to lots.
determiner Free text indicator of determiner. Requires det_otu_id or det_name to be present. Not applicable to lots.
identifier An identifier in the form of "namespace identifier", like "NCSU 1234". The namespace must be present in mx. Applicable to lots or specimens. Identifiers must be unique with an mx project (i.e. lots and specimens can not share identifiers).
notes Free text notes. None.
otu_id A mx OTU id to which a specimen or lot is primarily attached. Must match an OTU found by otu_name if otu_name is also provided.
otu_name A mx OTU name (not associated taxon name, manuscript name, nor matrix name) which identifies the specimen. The named OTU must exist in the database. Must match the OTU indicated by otu_id if otu_id is also provide. If otu_id is not provided must be unique for the given project.
repository_coden A mx repository coden, like "NCSU". Must match a coden in repositories. If provided repository_id must be empty.
repository_id A mx id matching a respository. Must match a coden in repositories. If provided repository_coden must be empty.
sex Free text determination of sex, standard use is "male" or "female". Applicable to lots or specimens. None.
stage Free text determination of stage. Applicable to lots or specimens. None.
taxon_name_id Mx taxon name id which identifiers a specimen. Must be tied to a single OTU. Applicable to specimens only. Must be empty if otu_id or otu_name is provided. Must match taxon_name_string if taxon_name_string is provided.
taxon_name_string Mx taxon name which identifiers a specimen. Name in mx must be unique, and associated id must be tied to a single OTU. Applicable to specimens only. Must be empty if otu_id or otu_name is provided. Must match taxon_name_id if taxon_name_string is provided. At present matches only single word string.
type The code recognized type status of the specimen. Like "holotype", "paratype", "syntype", "neotype", or "paralectotype". Not for non governed things like "allotype". Applicable to specimens only. Must be used in combination with type_of_taxon_name_id.
type_of_taxon_name_id The mx taxon name id that the type refers to. Applicable to specimens only. Must be used in combination with type.
verbatim_label The verbatim text of the label(s) of the specimen or lot. Identical verbatim_labels are tied to a single collecting event in mx. Individual lines can be separated with "||", individual labels can be separated with "++". "||" and '++' will be replaced with one and two new lines respectively. Must be empty if ce_id is provided.
latitude A real number in decimal degrees format. Must be empty if ce_id is provided.
longitude A real number in decimal degrees format. Must be empty if ce_id is provided.
elev_min A real number indicating the minimum elevation of the elevation range. Must be empty if ce_id is provided. Required if elev_max provided.
elev_max A real number indicating the maximum elevation of the elevation range. Must be empty if ce_id is provided.
elev_unit "feet" or "meters". Must be empty if ce_id is provided. Required if elev_min included.
geog_id An mx id for a geographic unit that indicates the finest geographic unit for the collecting event. Must be empty if ce_id is provided.
method Interpreted version of the method given on the label (not verbatim). Must be empty if ce_id is provided.
locality Free text description of the locality, e.g. "1.5 km N Calgary" Must be empty if ce_id is provided.
sd_d Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
sd_m Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
sd_y Verbatim, max 4 character integer. Must be empty if ce_id is provided.
ed_d Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
ed_m Verbatim, max 2 character field in roman numeral or integer. Must be empty if ce_id is provided.
ed_y Verbatim, max 4 character integer. Must be empty if ce_id is provided.
time_start Hours:Minutes, like "13:23". 24 hour clock. Must be empty if ce_id is provided.
time_end Hours:Minutes, like "13:23". 24 hour clock. Must be empty if ce_id is provided.
collectors Verbatim. Must be empty if ce_id is provided.
micro_habitat Verbatim. Must be empty if ce_id is provided.
macro_habitat Verbatim. Must be empty if ce_id is provided.
country As matched in the geogs table. Must be empty if ce_id is provided. Used to match against the geog table.
state As matched in the geogs table. Must be empty if ce_id is provided. Used to match against the geog table.
county As matched in the geogs table. Must be empty if ce_id is provided. Used to match against the geog table.
Personal tools