doc/06_misc_import.rst

   1 .. _bulk-import:
   2
   3 Bulk Import
   4 ===========
   5
   6 OpenERP has included a bulk import facility for CSV-ish files for a
   7 long time. With 7.0, both the interface and internal implementation
   8 have been redone, resulting in
   9 :meth:`~openerp.osv.orm.BaseModel.load`.
  10
  11 .. note::
  12
  13     the previous bulk-loading method,
  14     :meth:`~openerp.osv.orm.BaseModel.import_data`, remains for
  15     backwards compatibility but was re-implemented on top of
  16     :meth:`~openerp.osv.orm.BaseModel.load`, while its interface is
  17     unchanged its precise behavior has likely been altered for some
  18     cases (it shouldn't throw exceptions anymore in many cases where
  19     it previously did)
  20
  21 This document attempts to explain the behavior and limitations of
  22 :meth:`~openerp.osv.orm.BaseModel.load`.
  23
  24 Data
  25 ----
  26
  27 The input ``data`` is a regular row-major matrix of strings (in Python
  28 datatype terms, a ``list`` of rows, each row being a ``list`` of
  29 ``str``, all rows must be of equal length). Each row must be the same
  30 length as the ``fields`` list preceding it in the argslist.
  31
  32 Each field of ``fields`` maps to a (potentially relational and nested)
  33 field of the model under import, and the corresponding column of the
  34 ``data`` matrix provides a value for the field for each record.
  35
  36 Generally speaking each row of the input yields a record of output,
  37 and each cell of a row yields a value for the corresponding field of
  38 the row's record. There is currently one exception for this rule:
  39
  40 One to Many fields
  41 ++++++++++++++++++
  42
  43 Because O2M fields contain multiple records "embedded" in the main
  44 one, and these sub-records are fully dependent on the main record (are
  45 no other references to the sub-records in the system), they have to be
  46 spliced into the matrix somehow. This is done by adding lines composed
  47 *only* of o2m record fields below the main record:
  48
  49 .. literalinclude:: 06_misc_import_o2m.txt
  50
  51 the sections in double-lines represent the span of two o2m
  52 fields. During parsing, they are extracted into their own ``data``
  53 matrix for the o2m field they correspond to.
  54
  55 Import process
  56 --------------
  57
  58 Here are the phases of import. Note that the concept of "phases" is
  59 fuzzy as it's currently more of a pipeline, each record moves through
  60 the entire pipeline before the next one is processed.
  61
  62 Extraction
  63 ++++++++++
  64
  65 The first phase of the import is the extraction of the current row
  66 (and potentially a section of rows following it if it has One to Many
  67 fields) into a record dictionary. The keys are the ``fields``
  68 originally passed to :meth:`~openerp.osv.orm.BaseModel.load`, and the
  69 values are either the string value at the corresponding cell (for
  70 non-relational fields) or a list of sub-records (for all relational
  71 fields).
  72
  73 This phase also generates the ``rows`` indexes for any
  74 :ref:`import-message` produced thereafter.
  75
  76 Conversion
  77 ++++++++++
  78
  79 This second phase takes the record dicts, extracts the :term:`database
  80 ID` and :term:`external ID` if present and attempts to convert each
  81 field to a type matching what OpenERP expects to write.
  82
  83 * Empty fields (empty strings) are replaced with the ``False`` value
  84
  85 * Non-empty fields are converted through
  86   :class:`~openerp.addons.base.ir.ir_fields.ir_fields_converter`
  87
  88 .. note:: if a field is specified in the import, its default will *never* be
  89           used. If some records need to have a value and others need to use
  90           the model's default, either specify that default explicitly or do
  91           the import in two phases.
  92
  93 Char, text and binary fields
  94 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  95
  96 Are returned as-is, without any alteration.
  97
  98 Boolean fields
  99 ~~~~~~~~~~~~~~
 100
 101 The string value is compared (in a case-insensitive manner) to ``0``,
 102 ``false`` and ``no`` as well of any translation thereof loaded in the
 103 database. If the value matches one of these, the field is set to
 104 ``False``.
 105
 106 Otherwise the field is compared to ``1``, ``true`` and ``yes`` (and
 107 any translation of these in the database). The field is always set to
 108 ``True``, but if the value does not match one of these a warning will
 109 also be output.
 110
 111 Integers and float fields
 112 ~~~~~~~~~~~~~~~~~~~~~~~~~
 113
 114 The field is parsed with Python's built-in conversion routines
 115 (``int`` and ``float`` respectively), if the conversion fails an error
 116 is generated.
 117
 118 Selection fields
 119 ~~~~~~~~~~~~~~~~
 120
 121 The field is compared to 1. the values of the selection (first part of
 122 each selection tuple) and 2. all translations of the selection label
 123 found in the database.
 124
 125 If one of these is matched, the corresponding value is set on the
 126 field.
 127
 128 Otherwise an error is generated.
 129
 130 The same process applies to both list-type and function-type selection
 131 fields.
 132
 133 Many to One field
 134 ~~~~~~~~~~~~~~~~~
 135
 136 If the specified field is the relational field itself (``m2o``), the
 137 value is used in a ``name_search``. The first record returned by
 138 ``name_search`` is used as the field's value.
 139
 140 If ``name_search`` finds no value, an error is generated. If
 141 ``name_search`` finds multiple value, a warning is generated to warn
 142 the user of ``name_search`` collisions.
 143
 144 If the specified field is a :term:`external ID` (``m2o/id``), the
 145 corresponding record it looked up in the database and used as the
 146 field's value. If no record is found matching the provided external
 147 ID, an error is generated.
 148
 149 If the specified field is a :term:`database ID` (``m2o/.id``), the
 150 process is the same as for external ids (on database identifiers
 151 instead of external ones).
 152
 153 Many to Many field
 154 ~~~~~~~~~~~~~~~~~~
 155
 156 The field's value is interpreted as a comma-separated list of names,
 157 external ids or database ids. For each one, the process previously
 158 used for the many to one field is applied.
 159
 160 One to Many field
 161 ~~~~~~~~~~~~~~~~~
 162
 163 For each o2m record extracted, if the record has a ``name``,
 164 :term:`external ID` or :term:`database ID` the :term:`database ID` is
 165 looked up and checked through the same process as for m2o fields.
 166
 167 If a :term:`database ID` was found, a LINK_TO command is emmitted,
 168 followed by an UPDATE with the non-db values for the relational field.
 169
 170 Otherwise a CREATE command is emmitted.
 171
 172 Date fields
 173 ~~~~~~~~~~~
 174
 175 The value's format is checked against
 176 :data:`~openerp.tools.misc.DEFAULT_SERVER_DATE_FORMAT`, an error is
 177 generated if it does not match the specified format.
 178
 179 Datetime fields
 180 ~~~~~~~~~~~~~~~
 181
 182 The value's format is checked against
 183 :data:`~openerp.tools.misc.DEFAULT_SERVER_DATETIME_FORMAT`, an error
 184 is generated if it does not match.
 185
 186 The value is then interpreted as a datetime in the user's
 187 timezone. The timezone is specified thus:
 188
 189 * If the import ``context`` contains a ``tz`` key with a valid
 190   timezone name, this is the timezone of the datetime.
 191
 192 * Otherwise if the user performing the import has a ``tz`` attribute
 193   set to a valid timezone name, this is the timezone of the datetime.
 194
 195 * Otherwise interpret the datetime as being in the ``UTC`` timezone.
 196
 197 Create/Write
 198 ++++++++++++
 199
 200 If the conversion was successful, the converted record is then saved
 201 to the database via ``(ir.model.data)._update``.
 202
 203 Error handling
 204 ++++++++++++++
 205
 206 The import process will only catch 2 types of exceptions to convert
 207 them to error messages: ``ValueError`` during the conversion process,
 208 and sub-exceptions of ``psycopg2.Error`` during the create/write
 209 process.
 210
 211 The import process uses savepoint to:
 212
 213 * protect the overall transaction from the failure of each ``_update``
 214   call, if an ``_update`` call fails the savepoint is rolled back and
 215   the import process keeps going in order to obtain as many error
 216   messages as possible during each run.
 217
 218 * protect the import as a whole, a savepoint is created before
 219   starting and if any error is generated that savepoint is rolled
 220   back. The rest of the transaction (anything not within the import
 221   process) will be left untouched.
 222
 223 .. _import-message:
 224 .. _import-messages:
 225
 226 Messages
 227 --------
 228
 229 A message is a dictionary with 5 mandatory keys and one optional key:
 230
 231 ``type``
 232     the type of message, either ``warning`` or ``error``. Any
 233     ``error`` message indicates the import failed and was rolled back.
 234
 235 ``message``
 236     the message's actual text, which should be translated and can be
 237     shown to the user directly
 238
 239 ``rows``
 240     a dict with 2 keys ``from`` and ``to``, indicates the range of
 241     rows in ``data`` which generated the message
 242
 243 ``record``
 244     a single integer, for warnings the index of the record which
 245     generated the message (can be obtained from a non-false ``ids``
 246     result)
 247
 248 ``field``
 249     the name of the (logical) OpenERP field for which the error or
 250     warning was generated
 251
 252 ``moreinfo`` (optional)
 253     A string, a list or a dict, leading to more information about the
 254     warning.
 255
 256     * If ``moreinfo`` is a string, it is a supplementary warnings
 257       message which should be hidden by default
 258     * If ``moreinfo`` is a list, it provides a number of possible or
 259       alternative values for the string
 260     * If ``moreinfo`` is a dict, it is an OpenERP action descriptor
 261       which can be executed to get more information about the issues
 262       with the field. If present, the ``help`` key serves as a label
 263       for the action (e.g. the text of the link).