Data utilities

Data utilities — Functions for coalescing, merging, date handling and normalizing

Stability Level

Stable, unless otherwise indicated

Object Hierarchy


Includes

#include <libtracker-extract/tracker-extract.h>

Description

This API is provided to facilitate common more general functions which extractors may find useful. These functions are also used by the in-house extractors quite frequently.

Functions

tracker_coalesce ()

gchar *
tracker_coalesce (gint n_values,
                  ...);

tracker_coalesce has been deprecated since version 0.10 and should not be used in newly-written code.

Use tracker_coalesce_strip() instead.

This function iterates through a series of string pointers passed using @... and returns the first which is not NULL, not empty (i.e. "") and not comprised of one or more spaces (i.e. " ").

The returned value is stripped using g_strstrip(). All other values supplied are freed. It is MOST important NOT to pass constant string pointers to this function!

Parameters

n_values

the number of Varargs supplied

 

...

the string pointers to coalesce

 

Returns

the first string pointer from those provided which matches, otherwise NULL.

Since 0.8


tracker_coalesce_strip ()

const gchar *
tracker_coalesce_strip (gint n_values,
                        ...);

This function iterates through a series of string pointers passed using @... and returns the first which is not NULL, not empty (i.e. "") and not comprised of one or more spaces (i.e. " ").

The returned value is stripped using g_strstrip(). It is MOST important NOT to pass constant string pointers to this function!

Parameters

n_values

the number of @... supplied

 

...

the string pointers to coalesce

 

Returns

the first string pointer from those provided which matches, otherwise NULL.

Since 0.10


tracker_merge ()

gchar *
tracker_merge (const gchar *delimiter,
               gint n_values,
               ...);

tracker_merge has been deprecated since version 0.10 and should not be used in newly-written code.

Use tracker_merge_const() instead.

This function iterates through a series of string pointers passed using @... and returns a newly allocated string of the merged strings. All passed strings are freed (don't pass const values)/

The delimiter can be NULL. If specified, it will be used in between each merged string in the result.

Parameters

delimiter

the delimiter to use when merging

 

n_values

the number of @... supplied

 

...

the string pointers to merge

 

Returns

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_merge_const ()

gchar *
tracker_merge_const (const gchar *delimiter,
                     gint n_values,
                     ...);

This function iterates through a series of string pointers passed using @... and returns a newly allocated string of the merged strings.

The delimiter can be NULL. If specified, it will be used in between each merged string in the result.

Parameters

delimiter

the delimiter to use when merging

 

n_values

the number of @... supplied

 

...

the string pointers to merge

 

Returns

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.10


tracker_getline ()

gssize
tracker_getline (gchar **lineptr,
                 gsize *n,
                 FILE *stream);

Reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.

Read GNU getline()'s manpage for more information

Parameters

lineptr

Buffer to write into

 

n

Max bytes of linebuf

 

stream

Filestream to read from

 

Returns

the number of characters read, including the delimiter character, but not including the terminating NULL byte. This value can be used to handle embedded NULL bytes in the line read. Upon failure, -1 is returned.

Since 0.10


tracker_text_normalize ()

gchar *
tracker_text_normalize (const gchar *text,
                        guint max_words,
                        guint *n_words);

tracker_text_normalize has been deprecated since version 0.10 and should not be used in newly-written code.

Use tracker_text_validate_utf8() instead.

This function iterates through text checking for UTF-8 validity using g_utf8_get_char_validated(). For each character found, the GUnicodeType is checked to make sure it is one fo the following values:

  • G_UNICODE_LOWERCASE_LETTER

  • G_UNICODE_MODIFIER_LETTER

  • G_UNICODE_OTHER_LETTER

  • G_UNICODE_TITLECASE_LETTER

  • G_UNICODE_UPPERCASE_LETTER

All other symbols, punctuation, marks, numbers and separators are stripped. A regular space (i.e. " ") is used to separate the words in the returned string.

The n_words can be NULL. If specified, it will be populated with the number of words that were normalized in the result.

Parameters

text

the text to normalize

 

max_words

the maximum words of text to normalize

 

n_words

the number of words actually normalized

 

Returns

a newly-allocated string holding the result which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_text_validate_utf8 ()

gboolean
tracker_text_validate_utf8 (const gchar *text,
                            gssize text_len,
                            GString **str,
                            gsize *valid_len);

tracker_text_validate_utf8 is deprecated and should not be used in newly-written code.

This function iterates through text checking for UTF-8 validity using g_utf8_validate(), appends the first chunk of valid characters to str , and gives the number of valid UTF-8 bytes in valid_len .

Parameters

text

the text to validate

 

text_len

length of text , or -1 if NUL-terminated

 

str

the string where to place the validated UTF-8 characters, or NULL if not needed.

 

valid_len

Output number of valid UTF-8 bytes found, or NULL if not needed

 

Returns

TRUE if some bytes were found to be valid, FALSE otherwise.

Since 0.10


tracker_date_format_to_iso8601 ()

gchar *
tracker_date_format_to_iso8601 (const gchar *date_string,
                                const gchar *format);

This function uses strptime() to create a time tm structure using date_string and format .

Parameters

date_string

the date in a string pointer

 

format

the format of the date_string

 

Returns

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_date_guess ()

gchar *
tracker_date_guess (const gchar *date_string);

This function uses a number of methods to try and guess the date held in date_string . The date_string must be at least 5 characters in length or longer for any guessing to be attempted. Some of the string formats guessed include:

  • "YYYY-MM-DD" (Simple format)

  • "20050315113224-08'00'" (PDF format)

  • "20050216111533Z" (PDF format)

  • "Mon Feb 9 10:10:00 2004" (Microsoft Office format)

  • "2005:04:29 14:56:54" (Exif format)

  • "YYYY-MM-DDThh:mm:ss.ff+zz:zz

Parameters

date_string

the date in a string pointer

 

Returns

a newly-allocated string with the time represented in ISO8601 date format which should be freed with g_free() when finished with, otherwise NULL.

Since 0.8


tracker_keywords_parse ()

void
tracker_keywords_parse (GPtrArray *store,
                        const gchar *keywords);

Parses a keywords line into store, avoiding duplicates and stripping leading and trailing spaces from keywords. Allowed delimiters are , and ;

Parameters

store

Array where to store the keywords

 

keywords

Keywords line to parse

 

Since 0.10

Types and Values