Bug #212

Add strndup

Added by andy almost 9 years ago. Updated over 8 years ago.

Status:ClosedStart date:
Priority:LowDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hoi,

this patch adds strndup(3) and makes strdup(3) use it.

History

#1 Updated by joerg almost 9 years ago

On Tue, Jun 20, 2006 at 09:44:17PM +0200, Andreas Hauser wrote:
> this patch adds strndup(3) and makes strdup(3) use it.

Were is it used? strndup(3) feels completely useless, even more than
memdup. Aka a typical glibc invention.

Joerg

#2 Updated by andy almost 9 years ago

joerg wrote @ Tue, 20 Jun 2006 21:57:06 +0200:

> Were is it used?

In /usr/src:
bind
binutils
gdb
sendmail
heimdal
openssl
kerberos5

#3 Updated by joerg almost 9 years ago

On Tue, Jun 20, 2006 at 11:55:37PM +0200, Andreas Hauser wrote:
> bind

No comment about ISC code.

> binutils

Uses internal function with prefix all the time.

> gdb

Same.

> sendmail

Same.

> heimdal

Not used at all?

> openssl

BUF_strndump.

> kerberos5

See heimdal.

Joerg

#4 Updated by andy almost 9 years ago

joerg wrote @ Wed, 21 Jun 2006 00:25:06 +0200:
> On Tue, Jun 20, 2006 at 11:55:37PM +0200, Andreas Hauser wrote:
> > bind
>
> No comment about ISC code.
>
> > binutils
>
> Uses internal function with prefix all the time.
>
> > gdb
>
> Same.
>
> > sendmail
>
> Same.
>
> > heimdal
>
> Not used at all?

But they include their own strndup.c.

> > openssl
>
> BUF_strndump.
>
> > kerberos5
>
> See heimdal.

I'm not sure what you want. It is obviously not a useless function and
is even used in the current base. What does it matter wether they prefix
it or not with respect to the usefullness you question?

#5 Updated by andy almost 9 years ago

andy wrote @ 20 Jun 2006 21:44:17 +0200:

Fixed the off-by-one:
http://ftp.fortunaty.net/DragonFly/inofficial/patches/strndup.patch

#6 Updated by dillon almost 9 years ago

:andy wrote @ 20 Jun 2006 21:44:17 +0200:
:
:Fixed the off-by-one:
:http://ftp.fortunaty.net/DragonFly/inofficial/patches/strndup.patch

Umm. That code is broken. len is only the maximum allowed length,
the actual string may be smaller.

so e.g. someone might do: strndup("fubar", 16384). The returned
string should only be 'fubar\0', and only 6 bytes should be allocated,
not 16384.

-Matt
Matthew Dillon
<>

#7 Updated by andy almost 9 years ago

dillon wrote @ Tue, 20 Jun 2006 23:27:14 -0700 (PDT):
>
> :andy wrote @ 20 Jun 2006 21:44:17 +0200:
> :
> :Fixed the off-by-one:
> :http://ftp.fortunaty.net/DragonFly/inofficial/patches/strndup.patch
>
> Umm. That code is broken. len is only the maximum allowed length,
> the actual string may be smaller.
>
> so e.g. someone might do: strndup("fubar", 16384). The returned
> string should only be 'fubar\0', and only 6 bytes should be allocated,
> not 16384.

But when it works like that, one does not save the strlen.
Hence i see the dislike for the function.
I would like to have one, that does not work like that.
Is there already a name for it?

#8 Updated by dnikulin almost 9 years ago

On 21 Jun 2006 08:53:37 +0200, Andreas Hauser <> wrote:
>
> dillon wrote @ Tue, 20 Jun 2006 23:27:14 -0700 (PDT):
> >
> > :andy wrote @ 20 Jun 2006 21:44:17 +0200:
> > :
> > :Fixed the off-by-one:
> > :http://ftp.fortunaty.net/DragonFly/inofficial/patches/strndup.patch
> >
> > Umm. That code is broken. len is only the maximum allowed length,
> > the actual string may be smaller.
> >
> > so e.g. someone might do: strndup("fubar", 16384). The returned
> > string should only be 'fubar\0', and only 6 bytes should be allocated,
> > not 16384.
>
> But when it works like that, one does not save the strlen.
> Hence i see the dislike for the function.
> I would like to have one, that does not work like that.
> Is there already a name for it?

Why not call it memdup instead and drop the termination? String
functions for standard C, as broken as they are, are all based around
having a null terminator, and in your case you're actually basing
entirely off a length (but allocating for length + 1 which is very
counter-intuitive). Not that this function really achieves anything to
begin with...

I never cared for C-style strings. To set a length for them you have
to modify them, and this means you have to re-allocate if doing
read-only tokenizing or regex extraction. In my own code I define a
structure containing a length and a pointer, and when extracting
sub-strings, simply set up such a structure defining the scope of the
sub-string. If it needs to be copied out for safe writing, it's
trivial to do, and at no point is there a need to check through for a
null terminator. If the structure itself is on the stack you don't
even need to malloc. The whole thing translates nicely into any kind
of memory usage, and works naturally with buffering data blocks since
you already know the length. Additional plus to being able to store 0
as a valid byte, which apparently matters for some encodings.

http://members.optusnet.com.au/dnikulin/ppk-mem.h

Proof of concept implemented as a header of static inline functions.
BSD license, C99, should be WARNS6 clean too. This will probably solve
your problem a lot better than yet another broken string function.

-- Dmitri Nikulin

#9 Updated by dillon almost 9 years ago

:..
:> Umm. That code is broken. len is only the maximum allowed length,
:> the actual string may be smaller.
:>
:> so e.g. someone might do: strndup("fubar", 16384). The returned
:> string should only be 'fubar\0', and only 6 bytes should be allocated,
:> not 16384.
:
:But when it works like that, one does not save the strlen.
:Hence i see the dislike for the function.
:I would like to have one, that does not work like that.
:Is there already a name for it?
:
:--
:Andy

You don't save the strlen no matter what. It's a string function.
If you want to call it 'strndup' then it has to be compatible with
the linux strndup() and strndup()'s implementations on other platforms.

If it isn't taking the length of the string into account, it isn't a
string function and it shouldn't be called 'str*'.

In anycase, I wouldn't worry about the strlen(). We are talking
a few nanoseconds... maybe 10-20ns for most strings, and strndup()
is doing a malloc() anyway which is MUCH more expensive then strlen().
Don't try to over-optimize the functionality at the cost of creating
obfuscated code!

-Matt
Matthew Dillon
<>

#10 Updated by dillon almost 9 years ago

::But when it works like that, one does not save the strlen.
::Hence i see the dislike for the function.
::I would like to have one, that does not work like that.
::Is there already a name for it?
::
::--
::Andy
:
: You don't save the strlen no matter what. It's a string function.
: If you want to call it 'strndup' then it has to be compatible with
: the linux strndup() and strndup()'s implementations on other platforms.
:
: If it isn't taking the length of the string into account, it isn't a
: string function and it shouldn't be called 'str*'.
:
: In anycase, I wouldn't worry about the strlen(). We are talking
: a few nanoseconds... maybe 10-20ns for most strings, and strndup()
: is doing a malloc() anyway which is MUCH more expensive then strlen().
: Don't try to over-optimize the functionality at the cost of creating
: obfuscated code!

I need to amend this comment, because I implied that strlen() had to
be taken. In fact, it's a bit more complex then that. strndup() is
not allowed to scan the string beyond the specified maximum length
(because the string might not be terminated, as would be the case if
strndup() were used to cut out strings from a memory-mapped file).

So in this case strndup() would have to be implemented like this:

char *
strndup(const char *src, size_t n)
{
int len;
char *dst;

for (len = 0; len < n && s[len]; ++len) /* bounded strlen */
;
dst = malloc(len + 1);
bcopy(src, dst, len);
dst[len] = 0;
return(dst);
}

-Matt
Matthew Dillon
<>

#11 Updated by dillon almost 9 years ago

:Why not call it memdup instead and drop the termination? String
:functions for standard C, as broken as they are, are all based around
:having a null terminator, and in your case you're actually basing
:entirely off a length (but allocating for length + 1 which is very
:counter-intuitive). Not that this function really achieves anything to
:begin with...

A memdup that is not string-oriented is a fine idea, but it
would not be something we would add to libc unless there were
a pre-existing reasonably standardized function somewhere that
did that sort of operation. It's only a few lines of code but
the problem vis-a-vie putting things into libc is standardization.

:I never cared for C-style strings. To set a length for them you have
:to modify them, and this means you have to re-allocate if doing
:read-only tokenizing or regex extraction. In my own code I define a
:...
: -- Dmitri Nikulin

People are welcome to implement their own string handling functions,
but we aren't going to put things into libc that are not standardized
across multiple platforms. C's string handling functions aren't the
best in the world, but they aren't that bad either. \0 termination
is not a big deal and strlen() is not a big deal either.

Programs which manipulate very long strings often keep track of
the length of the string themselves. For example, the cpdup utility
manipulates potentially very long file paths and it caches index points
into the path strings to avoid having to call strlen() on the whole
string.

-Matt
Matthew Dillon
<>

#12 Updated by dnikulin almost 9 years ago

On 6/21/06, Matthew Dillon <> wrote:
> A memdup that is not string-oriented is a fine idea, but it
> would not be something we would add to libc unless there were
> a pre-existing reasonably standardized function somewhere that
> did that sort of operation. It's only a few lines of code but
> the problem vis-a-vie putting things into libc is standardization.
>
> People are welcome to implement their own string handling functions,
> but we aren't going to put things into libc that are not standardized
> across multiple platforms.

In neither case (memdup nor my own kit) was I talking about inclusion
into libc, which I consider completely useless because not everyone
else will do it, so it'll have to be duplicated in 'portable' code
bases anyway. I was merely saying that for Andreas' usage, he can
easily find a better way, and I gave my example of a clean foundation
for efficient and scalable memory referencing, which happens to work
well for byte-per-char strings too.

-- Dmitri Nikulin

#13 Updated by joerg almost 9 years ago

On Wed, Jun 21, 2006 at 01:13:20AM -0700, Matthew Dillon wrote:
> I need to amend this comment, because I implied that strlen() had to
> be taken. In fact, it's a bit more complex then that. strndup() is
> not allowed to scan the string beyond the specified maximum length
> (because the string might not be terminated, as would be the case if
> strndup() were used to cut out strings from a memory-mapped file).

And the unrolled loop can be even slower than strlen(). memchr would be
better for this purpose, but the issue remains: the interface has
potential for unexpected abuse / side effects.

Joerg

#14 Updated by joerg almost 9 years ago

On Wed, Jun 21, 2006 at 01:20:24AM -0700, Matthew Dillon wrote:
> A memdup that is not string-oriented is a fine idea, but it
> would not be something we would add to libc unless there were
> a pre-existing reasonably standardized function somewhere that
> did that sort of operation. It's only a few lines of code but
> the problem vis-a-vie putting things into libc is standardization.

For memdup there is precedence. The situation is a bit different for
that, since it does fill a hole (copying memory buffer). I don't like
strndup, since the meaning of "copy string up to a fixed length" is
asking for trouble. It doesn't allow to check for truncation without
killing the original intent. Ignoring truncation created enough problems
in the past already, let's not create another API for that.

Joerg

Also available in: Atom PDF