Top |
Functions
gchar * | tracker_coalesce () |
const gchar * | tracker_coalesce_strip () |
gchar * | tracker_merge () |
gchar * | tracker_merge_const () |
gssize | tracker_getline () |
gchar * | tracker_text_normalize () |
gboolean | tracker_text_validate_utf8 () |
gchar * | tracker_date_format_to_iso8601 () |
gchar * | tracker_date_guess () |
void | tracker_keywords_parse () |
Description
This API is provided to facilitate common more general functions which extractors may find useful. These functions are also used by the in-house extractors quite frequently.
Functions
tracker_coalesce ()
gchar * tracker_coalesce (gint n_values
,...
);
tracker_coalesce
has been deprecated since version 0.10 and should not be used in newly-written code.
Use tracker_coalesce_strip()
instead.
This function iterates through a series of string pointers passed
using @... and returns the first which is not NULL
, not empty
(i.e. "") and not comprised of one or more spaces (i.e. " ").
The returned value is stripped using g_strstrip()
. All other values
supplied are freed. It is MOST important NOT to pass constant
string pointers to this function!
Since 0.8
tracker_coalesce_strip ()
const gchar * tracker_coalesce_strip (gint n_values
,...
);
This function iterates through a series of string pointers passed
using @... and returns the first which is not NULL
, not empty
(i.e. "") and not comprised of one or more spaces (i.e. " ").
The returned value is stripped using g_strstrip()
. It is MOST
important NOT to pass constant string pointers to this function!
Since 0.10
tracker_merge ()
gchar * tracker_merge (const gchar *delimiter
,gint n_values
,...
);
tracker_merge
has been deprecated since version 0.10 and should not be used in newly-written code.
Use tracker_merge_const()
instead.
This function iterates through a series of string pointers passed using @... and returns a newly allocated string of the merged strings. All passed strings are freed (don't pass const values)/
The delimiter
can be NULL
. If specified, it will be used in
between each merged string in the result.
Parameters
delimiter |
the delimiter to use when merging |
|
n_values |
the number of @... supplied |
|
... |
the string pointers to merge |
Returns
a newly-allocated string holding the result which should
be freed with g_free()
when finished with, otherwise NULL
.
Since 0.8
tracker_merge_const ()
gchar * tracker_merge_const (const gchar *delimiter
,gint n_values
,...
);
This function iterates through a series of string pointers passed using @... and returns a newly allocated string of the merged strings.
The delimiter
can be NULL
. If specified, it will be used in
between each merged string in the result.
Parameters
delimiter |
the delimiter to use when merging |
|
n_values |
the number of @... supplied |
|
... |
the string pointers to merge |
Returns
a newly-allocated string holding the result which should
be freed with g_free()
when finished with, otherwise NULL
.
Since 0.10
tracker_getline ()
gssize tracker_getline (gchar **lineptr
,gsize *n
,FILE *stream
);
Reads an entire line from stream, storing the address of the buffer containing the text into *lineptr. The buffer is null-terminated and includes the newline character, if one was found.
Read GNU getline()
's manpage for more information
Returns
the number of characters read, including the delimiter
character, but not including the terminating NULL
byte. This value
can be used to handle embedded NULL
bytes in the line read. Upon
failure, -1 is returned.
Since 0.10
tracker_text_normalize ()
gchar * tracker_text_normalize (const gchar *text
,guint max_words
,guint *n_words
);
tracker_text_normalize
has been deprecated since version 0.10 and should not be used in newly-written code.
Use tracker_text_validate_utf8()
instead.
This function iterates through text
checking for UTF-8 validity
using g_utf8_get_char_validated()
. For each character found, the
GUnicodeType
is checked to make sure it is one fo the following
values:
G_UNICODE_LOWERCASE_LETTER
G_UNICODE_MODIFIER_LETTER
G_UNICODE_OTHER_LETTER
G_UNICODE_TITLECASE_LETTER
G_UNICODE_UPPERCASE_LETTER
All other symbols, punctuation, marks, numbers and separators are stripped. A regular space (i.e. " ") is used to separate the words in the returned string.
The n_words
can be NULL
. If specified, it will be populated with
the number of words that were normalized in the result.
Parameters
text |
the text to normalize |
|
max_words |
the maximum words of |
|
n_words |
the number of words actually normalized |
Returns
a newly-allocated string holding the result which should
be freed with g_free()
when finished with, otherwise NULL
.
Since 0.8
tracker_text_validate_utf8 ()
gboolean tracker_text_validate_utf8 (const gchar *text
,gssize text_len
,GString **str
,gsize *valid_len
);
tracker_text_validate_utf8
is deprecated and should not be used in newly-written code.
This function iterates through text
checking for UTF-8 validity
using g_utf8_validate()
, appends the first chunk of valid characters
to str
, and gives the number of valid UTF-8 bytes in valid_len
.
Since 0.10
tracker_date_format_to_iso8601 ()
gchar * tracker_date_format_to_iso8601 (const gchar *date_string
,const gchar *format
);
This function uses strptime()
to create a time tm structure using
date_string
and format
.
Returns
a newly-allocated string with the time represented in
ISO8601 date format which should be freed with g_free()
when
finished with, otherwise NULL
.
Since 0.8
tracker_date_guess ()
gchar *
tracker_date_guess (const gchar *date_string
);
This function uses a number of methods to try and guess the date
held in date_string
. The date_string
must be at least 5
characters in length or longer for any guessing to be attempted.
Some of the string formats guessed include:
"YYYY-MM-DD" (Simple format)
"20050315113224-08'00'" (PDF format)
"20050216111533Z" (PDF format)
"Mon Feb 9 10:10:00 2004" (Microsoft Office format)
"2005:04:29 14:56:54" (Exif format)
"YYYY-MM-DDThh:mm:ss.ff+zz:zz
Returns
a newly-allocated string with the time represented in
ISO8601 date format which should be freed with g_free()
when
finished with, otherwise NULL
.
Since 0.8