[dpdk-dev] rte_lpm with larger nexthops or another method?

Vladimir Medvedkin medvedkinv at gmail.com
Mon Jun 22 12:11:14 CEST 2015


Hi Matthew,

I just recently thought about next_hop extension. For ipv4 we can do
something like:
struct rte_lpm_tbl24_entry {
        /* Stores Next hop or group index (i.e. gindex)into tbl8. */
        union {
                uint32_t  next_hop :24;
                uint32_t  tbl8_gindex :24;
        }__attribute__((__packed__));
        /* Using single uint8_t to store 3 values. */
        uint32_t valid     :1; /**< Validation flag. */
        uint32_t ext_entry :1; /**< External entry. */
        uint32_t depth     :6; /**< Rule depth. */
};
so we have 24 bit for next_hop.

2015-06-22 5:29 GMT+03:00 Matthew Hall <mhall at mhcomputing.net>:

> Hello,
>
> I have gone out on the internet for days looking at a bunch of different
> radix tree implementations to see if I could figure a way to implement my
> own tree, just to work around the really low 255 CIDR block limitation in
> librte_lpm. Unfortunately every single one I could find falls into one of
> these two annoying categories:
>
> 1) bloated with a lot of irrelevant kernel code I don't care about
> (especially the Linux version but also the BSD one, which also makes a
> weird assumption every address object stores its length in byte 0 of the
> address struct). These are hard to convert into something that plays nice
> with raw packet data.
>
> 2) very seemingly simple code, which breaks horribly if you try to add
> IPv6 support (such as the radix tree from University of Michigan / LLVM
> compiler benchmark suite, and the one from the old unmaintained mrt daemon,
> which includes a bizarre custom reference-counted memory manager that is
> very convoluted). These are easy to set up, but cause a lot of weird
> segfaults which I am having a difficult time to try to debug.
>
> So it seems like I am going nowhere with this approach. Instead, I'd like
> to know, what would I need to do to add this support to my local copy of
> librte_lpm? Let's assume for the sake of this discussion, that I don't care
> one iota about any performance cost, and I am happy if I need to prefetch
> two cachelines instead of just one (which I recall from a past thread is
> why librte_lpm has such a low nexthop limit to start with).
>
> Failing that, does anybody have a known good userspace version of any of
> these sort of items:
>
> 1) Hash Based FIB (forwarding information base),
> 2) Tree Based FIB,
> 3) Patricia trie (which does not break horribly on IPv6 or make bad
> assumptions about data format besides uint8_t* and length),
> 4) Crit-Bit tree
> 5) any other good way of taking IPv4 and IPv6 and finding the longest
> prefix match against a table of pre-loaded CIDR blocks?
>
> I am really pulling out my hair trying to find a way to do something which
> doesn't seem like it should have to be be this difficult. I must be missing
> a more obvious way to handle this.
>
> Thanks,
> Matthew


More information about the dev mailing list