ATBG: method type inference with multiple interfaces

Today on the Coverity Development Testing Blog‘s continuing series Ask The Bug Guys, I take a question from an “Eric L”, who is confused about one of the subtle rules of method type inference despite having written the rule himself. My colleague Jon takes a question from a beginner C programmer about memory allocation.

As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

15 thoughts on “ATBG: method type inference with multiple interfaces

  1. Cool post.

    Any plans on restoring and continuing your entry about tail recursion? I was looking forward to reading it and its successor(s), but then it seemed to have disappeared.

  2. Pingback: Dew Drop – December 5, 2013 (#1677) | Morning Dew

  3. Most funny post ATBG.

    Since there is a link to a C question, wasn’t the struct supposed to be: struct word { int count; char word[0];}; wilhe allocating enough space for the count plus the size of the word? If the char word[0] is legal at all (those undefined behaviour on C that works fine always bugs me).

  4. Zero-sized arrays were some extension from god-knows-who, but are not legal in standard C89. Although pretty much every compiler had one way or the other to achieve this result.

    The C99 way to do this is:
    struct Word {
    int count;
    char word[];
    };
    and to allocate you ought to use: malloc(offsetof(struct word, word) + wordlen + 1);

    Note the offsetof – sizeof(struct Word) and offsetof(struct Word, word) are not necessarily the same – in practice they obviously will be but if you changed it to “long word[]” it would actually make a difference.

    C is fun this way..

    • I’m not sure zero-sized arrays were ever really a designed-in “extension”. I’ve used compilers where they were legal, but I’m not sure the those compilers did anything to support them other than not bothering to ensure that array declarations specified a non-zero size. Since checking whether the array is zero and outputting a message if so would probably add a dozen or so bytes or so to the code of the compiler, and since the normal behavior of a compiler in response to an array declaration (align the “next-item” offset/address to the size of the element, associate the current next-item offset/address with the named identifier, and then advance the offset/address by the element size times the element count) wouldn’t cause any problems if the array size is zero, some compiler writers decided to save the dozen or so bytes required for the check and, in so doing, happened to add a useful feature.

      • True that it’s not the exceedingly difficult to implement, but it’s still an extra feature to document and turn off if in strict ANSI mode. Also you need code to generate an error for something like, so it actually adds extra code (although I doubt more than a few dozen bytes)
        struct Foo {
        int foo;
        int bar[0];
        int baz[0];
        }

        • I’m not sure why you think examples like yours require extra code for the compiler to handle. Assuming 16-bit ints and 16-bit alignment, a structure

          struct foo {
          char b1[1]; // Offset 0
          char b2[0]; // Offset 1
          int w1[0]; // Offset 2 (due to padding)
          char b3[0]; // Offset 2 (same as above)
          int w2[1]; // Offset 2
          int b4[1]; // Offset 4
          }

          would have a size of six, with offsets as indicated. Zero-sized array elements behaved something like unions, but didn’t require nesting types.

          • Because the standard doesn’t allow this and therefore requires a warning message, so you’ve got code to detect it _anyway_, so now you add more code to be able to turn off the code that detects it.

          • You need the code anyhow to get standard compliant behavior (i.e. disallow it in these cases), but what I mean even if you wanted this extra feature implemented it only makes sense for a single array.

            There’s just no sensible behavior for what to do with baz here, but you’re right that since we’re already outside the spec anyhow, we could just do nothing and have really strange behavior – it’s just something that I think should generate an error even if you implement such an extension.

  5. Thank you, very helpful, BTW, did you mean, “long long”?

    And how about:
    malloc(offsetof(struct Word, word[wordlen + 1]));

    Looks more clear to me.

    • Good idea, if that is guaranteed to work (and I don’t see why not) that’s a much nicer way to write this!

      And no I don’t mean “long long”, basically anything with a natural alignment > 1 would be fine. Although I’m not sure on this point, i.e. whether sizeof(struct Foo) is guaranteed to include the necessary padding for the array or not. If it does then using offsetof or sizeof would make no difference.

  6. I don’t think it is suitable for the Bug Guys so I will post a request here. Can you explain if it will be possible and how hard it would be to implement F# style automatic generalization (I think the official name is Hindley-Milner type inference) in C#. Is it possible at all with C-style syntax or there is something in the core in the language that enable it.

    • I am occasionally asked why we did not implement HM inference for generic method type inference. We considered it, but there were several points against it. (1) There are certain difficulties in making it work with languages that use nominal subtyping, as C# does. (2) Though the common-case performance is good, the worst-case performance can be bad. (3) The algorithm is complicated and tricky to get right. (4) The algorithm is hard to explain to customers. The point of HM inference is to make every possible inference but that is explicitly not a design goal of C#; one of the goals of C# was that the language analysis could be understood by ordinary users. It has not entirely succeeded in this regard; it is fair to say that most users do not understand all the subtle points of overload resolution. But it’s pretty good. Had the language started with HM type inference then it would be natural to continue to use it, but it did not.

Leave a comment