NEWS

# Change log

## 3.1.0

-   Fixes
    -   `naaccr_encode` now outputs the correct date time format for partial
        date time values. Previously, it simply copied whatever format the original
        string used.

## 3.0.0

-   !!Breaking change!!
    -   `naaccr_endcode` no longer pads values with blanks to the full width of
        their fields.
        If the standard padding character is not a blank (e.g., `"0"`), then
        the values will still be padded.
        For example, `naaccr_encode("New York", field = "addrAtDxCity")` returns
        `"New York"`, and `naaccr_encode(1, field = "ageAtDiagnosis")` returns
        `"001"`.
    - `naaccr_encode` will still set overly-wide values to blanks with a warning.
      It is a helper function for writing records to a file, and even XML file
      readers may break if values are wider than expected.
-   New things
    -   NAACCR format versions 24 and 25 have been added.
    -   `naaccr_datetime` now accepts values in ISO-8501 format (with timezones!).
    -   New function `parse_geocoding_quality_codes` to break the dense
        "geocodingQualityCodeDetail" field into its 14 data components.
-   Fixes
    -   Fixed factor label for `serumBeta2MicroglobulinPretxLvl` value of `"0"`.

## 2.0.2

-   Fix a bug with `read_naaccr_xml_plain` when a `record_format` is provided

## 2.0.0

-   New functions
    -   `read_naaccr_xml_plain` and `read_naaccr_xml` for reading files in
        NAACCR's XML formats
-   New data
    -   `naaccr_formats`, a named list containing `record_format` objects for
        the official NAACCR formats
    -   `naaccr_format_21`, `naaccr_format_22`, and `naaccr_format_23` for the
        NAACCR XML formats, versions 21, 22 and 23
-   Possible breaking changes
    -   New columns in `record_format`
        -   `"parent"`: the field's parent XML node
        -   `"width"`: field's length in characters
        -   `"cleaner"`: function used to clean/standardize data when reading
        -   `"unknown_finder"`: function to replace values with `NA` when reading
    -   Deprecated columns in `record_format`
        -   `"end_col"`: not used for XML files, and inferred from `"start_col"`
            and `"width"` for fixed-width files
    -   Package data now compressed using the `"xz"` method, which requires
        R version 2 or higher

## 1.0.0

-   Added `field_levels` and `field_levels_all` lists, which make it easy to see
    the possible values for factor fields.
-   All factor and sentinel levels are now easy to type on a standard U.S.
    QWERTY keyboard.
-   Repeatedly applying `naaccr_factor` or `naaccr_sentinel`
    (e.g., `naaccr_factor(naaccr_factor(...))`) is now a no-op.
-   Removed AJCC-copyright material.
-   Functions are safer: more argument checking, warnings, and option-robustness.
-   Fixed bugs with handling custom formats and columns not in the format.
-   Fixed bug with formats that included fields not found in text files

## 0.5.0

-   Processed date and time fields now store their original text values in the
    `"original"` attribute. Partial dates and times are still useful.
-   New functions
    -   `unknown_to_na`: Convert levels of NAACCR factor to `NA` if they mean
        the value is unknown.
    -   `naaccr_encode`: Convert processed values back to their NAACCR text
        codes.
    -   `as.record_format`: Create a custom record format from an object
        (this was incorrectly an internal function before).
    -   `naaccr_date`, `naaccr_datetime`: Parse NAACCR-formatted dates and times.
    -   `read_naaccr_plain`: Read a NAACCR file and divide it into fields, but
        don't do any more processing.
    -   `split_sequence_number`: Divide the different pieces of information in a
        tumor sequence number into multiple columns (i.e., order, uniqueness,
        and invasiveness).
-   Revamped `read_naaccr` and `write_naaccr`
    -   *Much* faster!
    -   `read_naaccr` has more parameters to handle how a file is read:
        `skip`, `nrows`, and `encoding` like in `read.table`, and `buffersize`
        to ease the pain of reading large files.
-   `naaccr_record` and its kin now have a `format` parameter to use custom
    formats.
-   Treat behavior fields as coded factor, as they should've been
-   **Potentially breaking changes**
    -   Cleaning functions (e.g., `clean_count`) no longer use a field name to
        retrieve cleaning parameters (e.g., field width).
        Those parameters now need to passed directly.
    -   The `type` and `alignment` columns of `record_format` objects are now
        factors.
    -   The standard NAACCR record format objects (`naaccr_format_12`, ...)
        are no longer lazy-loaded. They are now immediately loaded.
-   Improved documentation throughout.

## 0.4.0

-   Major improvements to handling of field-specific codes

## 0.3.0

### New features

-   New function: `write_naaccr` for writing `naaccr_record` data sets to text
    files according to one of the NAACCR formats. For now, it writes all unknown
    and missing values as blanks, not the standard codes. Still, `write_naaccr`
    and `read_naaccr` are inverse functions.
-   Fields with country codes are now converted to factors.

### Backend

-   Using package `ISOcodes` instead of `maps` for location code tables