This function truncates HLA typing values in molecular nomenclature (for example from 4 fields to 2 fields). The truncation is based on the number of fields specified and optionally retains any WHO-recognized suffixes (L, S, C, A, Q, or N) or G and P group designations (G or P). This function will work on individual alleles (e.g. "HLA-A*02:01:01:01") or on all alleles in a GL string (e.g. "HLA-A*02:01:01:01+HLA-A*68:01:01^HLA-DRB1*01:01:01+HLA-DRB1*03:01:01").
Note: depending on arguments used, this function can output HLA alleles that do not exist in the IPD-IMGT/HLA database. For example, truncating the allele "DRB4*01:03:01:02N" to 2 fields would result in "DRB4*01:03N," which does not exist in the IPD-IMGT/HLA database. Users should take care in setting the parameters for this function.
Usage
HLA_truncate(
data,
fields = 2,
keep_suffix = TRUE,
keep_G_P_group = FALSE,
remove_duplicates = FALSE
)Arguments
- data
A string containing an HLA allele or a GL string.
- fields
An integer specifying the number of fields to retain in the truncated values. Default is 2.
- keep_suffix
A logical value indicating whether to retain any WHO-recognized suffixes. Default is TRUE.
- keep_G_P_group
A logical value indicating whether to retain any G or P group designations. Default is FALSE.
- remove_duplicates
A logical value indicating whether to remove duplicated values from a GL string after truncation. Default is FALSE.
Value
A string with the HLA typing truncated according to the specified number of fields and optional suffix retention.
Examples
# The Haplotype_frequencies dataset contains a table with HLA typing spread across multiple columns:
print(Haplotype_frequencies)
#> # A tibble: 10 × 12
#> `HLA-A` `HLA-C` `HLA-B` `HLA-DRB345` `HLA-DRB1` `HLA-DQA1` `HLA-DQB1`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A*24:02:01:01 C*03:04:… B*40:0… Abs DRB1*08:0… DQA1*04:0… DQB1*04:0…
#> 2 A*03:01:01:05 C*06:02:… B*47:0… DRB4*01:01:… DRB1*07:0… DQA1*02:0… DQB1*02:0…
#> 3 A*02:01:01:01 C*05:01:… B*44:0… DRB3*01:01:… DRB1*03:0… DQA1*05:0… DQB1*02:0…
#> 4 A*32:01:01 C*02:02:… B*40:0… DRB3*02:02:… DRB1*11:0… DQA1*05:0… DQB1*03:0…
#> 5 A*02:01:01:01 C*05:01:… B*44:0… DRB5*01:01:… DRB1*15:0… DQA1*01:0… DQB1*06:0…
#> 6 A*02:01:01:01 C*05:01:… B*44:0… DRB4*01:03:… DRB1*04:0… DQA1*03:0… DQB1*03:0…
#> 7 A*02:06:01:01 C*08:01:… B*40:0… DRB4*01:03:… DRB1*09:0… DQA1*03:02 DQB1*03:0…
#> 8 A*24:02:01:01 C*07:02:… B*39:0… DRB5*02:02 DRB1*16:0… DQA1*05:0… DQB1*03:0…
#> 9 A*02:01:01:01 C*02:02:… B*40:0… DRB3*02:02:… DRB1*13:0… DQA1*01:0… DQB1*06:0…
#> 10 A*24:02:01:01 C*07:04:… B*44:0… DRB3*02:02:… DRB1*11:0… DQA1*05:0… DQB1*03:0…
#> # ℹ 5 more variables: `HLA-DPA1` <chr>, `HLA-DPB1` <chr>, Global_Fq <dbl>,
#> # Global_n <int>, Global_rank <int>
# The `HLA_truncate` function can be used to truncate the typing results to 2 fields:
library(dplyr)
Haplotype_frequencies %>% mutate(
across(
"HLA-A":"HLA-DPB1",
~ HLA_truncate(
.,
fields = 2,
keep_suffix = TRUE,
keep_G_P_group = FALSE
)
)
)
#> # A tibble: 10 × 12
#> `HLA-A` `HLA-C` `HLA-B` `HLA-DRB345` `HLA-DRB1` `HLA-DQA1` `HLA-DQB1`
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 A*24:02 C*03:04 B*40:01 Abs DRB1*08:01 DQA1*04:01 DQB1*04:02
#> 2 A*03:01 C*06:02 B*47:01 DRB4*01:01 DRB1*07:01 DQA1*02:01 DQB1*02:02
#> 3 A*02:01 C*05:01 B*44:02 DRB3*01:01 DRB1*03:01 DQA1*05:01 DQB1*02:01
#> 4 A*32:01 C*02:02 B*40:02 DRB3*02:02 DRB1*11:01 DQA1*05:05 DQB1*03:01
#> 5 A*02:01 C*05:01 B*44:02 DRB5*01:01 DRB1*15:01 DQA1*01:02 DQB1*06:02
#> 6 A*02:01 C*05:01 B*44:02 DRB4*01:03/DRB4*01:… DRB1*04:01 DQA1*03:03 DQB1*03:01
#> 7 A*02:06 C*08:01 B*40:06 DRB4*01:03 DRB1*09:01 DQA1*03:02 DQB1*03:0…
#> 8 A*24:02 C*07:02 B*39:05 DRB5*02:02 DRB1*16:02 DQA1*05:05 DQB1*03:01
#> 9 A*02:01 C*02:02 B*40:02 DRB3*02:02 DRB1*13:01 DQA1*01:03 DQB1*06:03
#> 10 A*24:02 C*07:04 B*44:02 DRB3*02:02 DRB1*11:01 DQA1*05:05 DQB1*03:01
#> # ℹ 5 more variables: `HLA-DPA1` <chr>, `HLA-DPB1` <chr>, Global_Fq <dbl>,
#> # Global_n <int>, Global_rank <int>