SUBSTRING
signature
The SUBSTRING signature specifies manipulations on an abstract representation of a piece of a string. A substring value can be modeled as a triple (s, i, n)
, where s is the underlying string, i is the starting index, and n is the size of the substring, with the constraint that 0 <= i <= i + n <= size
s.
The substring type and its attendant functions provide a convenient abstraction for performing a variety of common analyses of strings, such as finding the leftmost occurrence, if any, of a character in a string. In addition, using the substring functions avoids much of the copying and bounds checking that occurs if similar operations are implemented solely in terms of strings.
The SUBSTRING signature is matched by two structures, the required Substring and the optional WideSubstring. The former is a companion structure to the Char and String structures, which are based on the extended ASCII 8-bit character set. The structure WideSubstring is related in the same way to the structures WideChar and WideString, which are based on characters of some size greater than or equal to 8 bits. In particular, substructure Substring.String is identical to the structure String and, when WideSubstring is defined, the substructure WideSubstring.String is identical to WideString.
signature SUBSTRING
structure Substring
: SUBSTRING
structure WideSubstring
: SUBSTRING
structure String : STRING
type substring
val base : substring -> (String.string * int * int)
val string : substring -> String.string
val extract : (String.string * int * int option) -> substring
val substring : (String.string * int * int) -> substring
val all : String.string -> substring
val isEmpty : substring -> bool
val getc : substring -> (String.Char.char * substring) option
val first : substring -> String.Char.char option
val triml : int -> substring -> substring
val trimr : int -> substring -> substring
val slice : (substring * int * int option) -> substring
val sub : (substring * int) -> char
val size : substring -> int
val concat : substring list -> String.string
val explode : substring -> String.Char.char list
val isPrefix : String.string -> substring -> bool
val compare : (substring * substring) -> order
val collate : ((String.Char.char * String.Char.char) -> order) -> (substring * substring) -> order
val splitl : (String.Char.char -> bool) -> substring -> (substring * substring)
val splitr : (String.Char.char -> bool) -> substring -> (substring * substring)
val splitAt : (substring * int) -> (substring * substring)
val dropl : (String.Char.char -> bool) -> substring -> substring
val dropr : (String.Char.char -> bool) -> substring -> substring
val takel : (String.Char.char -> bool) -> substring -> substring
val taker : (String.Char.char -> bool) -> substring -> substring
val position : String.string -> substring -> (substring * substring)
val span : (substring * substring) -> substring
val translate : (String.Char.char -> String.string) -> substring -> String.string
val tokens : (String.Char.char -> bool) -> substring -> substring list
val fields : (String.Char.char -> bool) -> substring -> substring list
val foldl : ((String.Char.char * 'a) -> 'a) -> 'a -> substring -> 'a
val foldr : ((String.Char.char * 'a) -> 'a) -> 'a -> substring -> 'a
val app : (String.Char.char -> unit) -> substring -> unit
structure String
type substring
base s
(s, i, n)
representing the concrete representation of the substring. s is the underlying string, i is the starting index, and n is the size of the substring.
string s
String.substring o base
.
extract (s, i, NONE)
extract (s, i, SOME j)
substring (s, i, j)
s[i..size s-1]
. This raises Subscript if i < 0
or size s < i
. The second form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]
. It raises Subscript if i < 0 or j < 0 or size
s < i + j. Note that, if defined, extract returns the empty string when i = size s
.
The third form returns the substring of length j starting at index i, i.e., the string s[i..i+j-1]
. This is equivalent to extract(s, i, SOME j)
.
We require that base o substring
be the identity function on valid arguments. Conversely, substring o base
is the identity function on all arguments.
Implementation note:
Note that implementations of this function must perform bounds checking in such a way that the Overflow exception is not raised.
all s
substring(s, 0, String.size s)
.
isEmpty s
true
if the substring has size 0.
getc s
first s
triml k s
trimr k s
ss = substring(s, i, j)
and k > j, we have:
triml k ss = substring(s, i+j, 0) trimr k ss = substring(s, i, 0)The exception Subscript is raised if
k < 0
.
slice (s, i, SOME m)
slice (s, i, NONE)
size s - i
. To be valid, the arguments in the first case must satisfy 0 <= i, 0 <= m and i + m <= n, where n is the length of s. In the second case, the arguments must satisfy 0 <= i and i <= n. If the arguments are not valid, the exception Subscript is raised.
sub (s, i)
String.sub(string s, i)
. The exception Subscript is raised unless i is non-negative and less than the size of s.
size s
#3 o base
and String.size o string
.
concat l
String.concat o (List.map string)
.
explode s
String.explode (string s)
.
isPrefix s ss
true
if the string s is a prefix of the substring ss.
compare (s, t)
String.compare (string s, string t)
collate f (s, t)
String.collate f (string s, string t)
splitl f s
splitr f s
a
and c
satisfy the predicate, but character X
does not, then these functions work as follows on the substring aaaXbbbbXccc
:
splitl : aaa XbbbbXccc splitr : aaaXbbbbX ccc
splitAt (s, i)
(ss, ss')
, where ss contains the first i characters of s and ss' contains the rest, assuming 0 <= i <= size
s. Otherwise, it raises Subscript.
dropl p s
dropr p s
takel p s
taker p s
takel p s = #1(splitl p s) dropl p s = #2(splitl p s) taker p s = #2(splitr p s) dropr p s = #1(splitr p s)
position s ss
(pref, suff)
of substrings, where suff is the longest suffix of ss that has s as a prefix and pref is the prefix of ss preceding suff. More precisely, let m be the size of s and let ss correspond to the substring (s', i, n)
. If there is a least index k such that s = s'[k..k+m-1]
, then suff corresponds to (s', k, n+i-k)
and pref corresponds to (s', i, k-i)
. If there is no such k, then suff is the empty substring corresponding to (s', i+n, 0)
and pref corresponds to (s', i, n)
, i.e., all of ss.
span (ss, ss')
val (s, i, n) = base ss val (s', i', n') = base ss'then
span
returns substring(s, i, (i'+n')-i)
unless s <> s'
or i'+n' < i
, in which case it raises Span.
This function allows one to scan for a substring using multiple pieces and then coalescing the pieces. For example, given a URL string such as
"http://www.research.att.com/jhr/sml/basis"to scan the protocol and host (
"http://www.research.att.com"
), one could write:
fun protoAndHost url = let fun notc (c : char) = fn c' => c <> c' val (proto,rest) = splitl (notc #":") (all url) val host = takel (notc #"/") (trim 3 rest) in span (proto, host) end
Implementation note:
When applied to substrings derived from the identical base string, the string equality test should be constant time. This can be achieved by first doing a pointer test and, if that fails, then checking the strings character by character.
translate f s
String.concat(List.map f (explode s))
.
tokens p s
fields p s
Two tokens may be separated by more than one delimiter, whereas two fields are separated by exactly one delimiter. For example, if the only delimiter is the character #"|"
, then the substring "|abc||def"
contains two tokens "abc"
and "def"
, whereas it contains the four fields ""
, "abc"
, ""
and "def"
.
foldl f a s
foldr f a s
List.foldl f a (explode s) List.foldr f a (explode s)
app f s
List.app f (explode s)
.
CHAR, STRING, StringCvt, MultiByte, List
Last Modified March 7, 1997
Comments to John Reppy.
Copyright © 1997 Bell Labs, Lucent Technologies