[dpdk-dev] lib: added support for armv7 architecture

Jan Viktorin viktorin at rehivetech.com
Sun Oct 11 23:17:13 CEST 2015


Hello David, all,

I am reviewing your patch for things to be included in the final ARMv7
support... It is quite long, you know. We should probably break the
potential discussion in shorter e-mails, a thread per topic. See my
comments below.

Regards
Jan

On Fri,  2 Oct 2015 22:50:02 +0100
David Hunt <david.hunt at intel.com> wrote:

> From: Amruta Zende <amruta.zende at intel.com>
> 
> Signed-off-by: Amruta Zende <amruta.zende at intel.com>
> Signed-off-by: David Hunt <david.hunt at intel.com>
> 
> ---
> MAINTAINERS                                        |    5 +
>  config/defconfig_arm-native-linuxapp-gcc           |   56 ++++
>  .../common/include/arch/arm/rte_atomic.h           |  269 ++++++++++++++++++++
>  .../common/include/arch/arm/rte_byteorder.h        |  146 +++++++++++
>  .../common/include/arch/arm/rte_cpuflags.h         |  138 ++++++++++
>  .../common/include/arch/arm/rte_cycles.h           |   77 ++++++
>  .../common/include/arch/arm/rte_memcpy.h           |  101 ++++++++
>  .../common/include/arch/arm/rte_prefetch.h         |   64 +++++
>  .../common/include/arch/arm/rte_rwlock.h           |   70 +++++
>  .../common/include/arch/arm/rte_spinlock.h         |  116 +++++++++

I do not know yet, how different is the ARMv8 support, how it
integrates with the ARMv7 part... Will we have a single directory "arm"?

>  lib/librte_eal/common/include/arch/arm/rte_vect.h  |   37 +++
>  lib/librte_eal/linuxapp/Makefile                   |    3 +
>  lib/librte_eal/linuxapp/arm_pmu/Makefile           |   52 ++++
>  lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c   |   83 ++++++
>  mk/arch/arm/rte.vars.mk                            |   58 +++++
>  mk/machine/armv7-a/rte.vars.mk                     |   63 +++++
>  mk/toolchain/gcc/rte.vars.mk                       |    8 +-
>  17 files changed, 1343 insertions(+), 3 deletions(-)
>  create mode 100644 config/defconfig_arm-native-linuxapp-gcc
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_atomic.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_byteorder.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_cycles.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_prefetch.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_rwlock.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_spinlock.h
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_vect.h
>  create mode 100755 lib/librte_eal/linuxapp/arm_pmu/Makefile
>  create mode 100644 lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c
>  create mode 100644 mk/arch/arm/rte.vars.mk
>  create mode 100644 mk/machine/armv7-a/rte.vars.mk
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 080a8e8..9d99d53 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -124,6 +124,11 @@ IBM POWER
>  M: Chao Zhu <chaozhu at linux.vnet.ibm.com>
>  F: lib/librte_eal/common/include/arch/ppc_64/
>  
> +Arm V7
> +M: Amrute Zende <amruta.zende at intel.com>
> +M: David Hunt <david.hunt at intel.com>
> +F: lib/librte_eal/common/include/arch/arm/

We were discussing the ARM maintainer with Thomas Manjalon. As Dave
seems, he is not going to do the maintanance in a long-term, I can take
care of the ARM maintanance as well. However, I will not have a proper
ARMv8 machine available for some time.

> +
>  Intel x86
>  M: Bruce Richardson <bruce.richardson at intel.com>
>  M: Konstantin Ananyev <konstantin.ananyev at intel.com>
> diff --git a/config/defconfig_arm-native-linuxapp-gcc b/config/defconfig_arm-native-linuxapp-gcc

This is interesting. Should we have a configuration for "arm" or better
for "armv7" or "armv7-a" or even more specifically for
"cortex-a9/15/..."?

> new file mode 100644
> index 0000000..159aa36
> --- /dev/null
> +++ b/config/defconfig_arm-native-linuxapp-gcc
> @@ -0,0 +1,56 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2015 Intel Corporation. All rights reserved.
> +#
> [...]
> +#
> +
> +#include "common_linuxapp"
> +
> +CONFIG_RTE_MACHINE="armv7-a"
> +
> +CONFIG_RTE_ARCH="arm"
> +CONFIG_RTE_ARCH_ARM32=y
> +CONFIG_RTE_ARCH_32=y
> +
> +CONFIG_RTE_TOOLCHAIN="gcc"
> +CONFIG_RTE_TOOLCHAIN_GCC=y
> +
> +CONFIG_RTE_FORCE_INTRINSICS=y
> +CONFIG_RTE_LIBRTE_VHOST=n
> +CONFIG_RTE_LIBRTE_KNI=n
> +CONFIG_RTE_KNI_KMOD=n

I could see a note somewhere that KNI is for 64-bit systems only. Am I
right? Would it be possible to have KNI for ARM?

> +CONFIG_RTE_LIBRTE_LPM=n
> +CONFIG_RTE_LIBRTE_ACL=n
> +CONFIG_RTE_LIBRTE_SCHED=n
> +CONFIG_RTE_LIBRTE_PORT=n
> +CONFIG_RTE_LIBRTE_PIPELINE=n
> +CONFIG_RTE_LIBRTE_TABLE=n

The libraries compiles (expect of LPM and ACL due to SSE issues). So
some of them will be enabled as default.

> +CONFIG_RTE_IXGBE_INC_VECTOR=n
> +CONFIG_RTE_LIBRTE_VIRTIO_PMD=n
> +CONFIG_RTE_LIBRTE_CXGBE_PMD=n
> +
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic.h b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
> new file mode 100644
> index 0000000..2a7339c
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic.h
> @@ -0,0 +1,269 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_ATOMIC_ARM32_H_
> +#define _RTE_ATOMIC_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_atomic.h"
> +
> +/**
> + * @file
> + * Atomic Operations
> + *
> + * This file defines a API for atomic operations.
> + */
> +
> +/**
> + * General memory barrier.
> + *
> + * Guarantees that the LOAD and STORE operations generated before the
> + * barrier occur before the LOAD and STORE operations generated after.
> + */
> +#define	rte_mb()  { __sync_synchronize(); }
> +
> +/**
> + * Write memory barrier.
> + *
> + * Guarantees that the STORE operations generated before the barrier
> + * occur before the STORE operations generated after.
> + */
> +#define	rte_wmb() {asm volatile("dsb st" : : : "memory"); }
> +
> +/**
> + * Read memory barrier.
> + *
> + * Guarantees that the LOAD operations generated before the barrier
> + * occur before the LOAD operations generated after.
> + */
> +#define	rte_rmb() {asm volatile("dsb " : : : "memory"); }
> +

This is probably too strong barrier. Our patch series also doesn't do
it properly. It seems, we better should have

rte_wmb: dmb st
rte_rmb: dmb sy (= __sync_synchronize)

> +
> +
> +/*------------------------- 16 bit atomic operations -------------------------*/
> +
> +#ifndef RTE_FORCE_INTRINSICS
> +static inline int
> +rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
> +{
> +	return __sync_bool_compare_and_swap(dst, exp, src);

Should we go with __sync_* or __atomic_* constructs? The __atomic_*
instrinsics are coming with GCC 4.7. Our patch set uses __atomic_* ones.

> +}
> +
> +static inline void
> +rte_atomic16_inc(rte_atomic16_t *v)
> +{
> +	rte_atomic16_add(v, 1);
> +}
> +
> +static inline void
> +rte_atomic16_dec(rte_atomic16_t *v)
> +{
> +	rte_atomic16_sub(v, 1);
> +}
> +
> +static inline int rte_atomic16_inc_and_test(rte_atomic16_t *v)
> +{
> +	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
> +}
> +
> +static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
> +{
> +	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
> +}
> +
> +static inline int rte_atomic16_test_and_set(rte_atomic16_t *v)
> +{
> +	return rte_atomic16_cmpset((volatile uint16_t *)&v->cnt, 0, 1);
> +}
> +
> +
> +/*------------------------- 32 bit atomic operations -------------------------*/
> +
> +static inline int
> +rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
> +{
> +	return __sync_bool_compare_and_swap(dst, exp, src);
> +}
> +
> +static inline void
> +rte_atomic32_inc(rte_atomic32_t *v)
> +{
> +	rte_atomic32_add(v, 1);
> +}
> +
> +static inline void
> +rte_atomic32_dec(rte_atomic32_t *v)
> +{
> +	rte_atomic32_sub(v, 1);
> +}
> +
> +static inline int rte_atomic32_inc_and_test(rte_atomic32_t *v)
> +{
> +	return (__sync_add_and_fetch(&v->cnt, 1) == 0);
> +}
> +
> +static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
> +{
> +	return (__sync_sub_and_fetch(&v->cnt, 1) == 0);
> +}
> +
> +static inline int rte_atomic32_test_and_set(rte_atomic32_t *v)
> +{
> +	return rte_atomic32_cmpset((volatile uint32_t *)&v->cnt, 0, 1);
> +}
> +
> +/*------------------------- 64 bit atomic operations -------------------------*/
> +
> +static inline int
> +rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
> +{
> +	return __sync_bool_compare_and_swap(dst, exp, src);
> +}
> +
> +static inline void
> +rte_atomic64_init(rte_atomic64_t *v)
> +{
> +#ifdef __LP64__
> +	v->cnt = 0;

I assume, this is for ARMv8...

> +#else
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		success = rte_atomic64_cmpset((volatile uint64_t *)&v->cnt,
> +				tmp, 0);
> +	}
> +#endif
> +}
> +
> +static inline int64_t
> +rte_atomic64_read(rte_atomic64_t *v)
> +{
> +#ifdef __LP64__
> +	return v->cnt;
> +#else
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		/* replace the value by itself */
> +		success = rte_atomic64_cmpset((volatile uint64_t *)&v->cnt,
> +				tmp, tmp);
> +	}
> +	return tmp;
> +#endif
> +}
> +
> +static inline void
> +rte_atomic64_set(rte_atomic64_t *v, int64_t new_value)
> +{
> +#ifdef __LP64__
> +	v->cnt = new_value;
> +#else
> +	int success = 0;
> +	uint64_t tmp;
> +
> +	while (success == 0) {
> +		tmp = v->cnt;
> +		success = rte_atomic64_cmpset((volatile uint64_t *)&v->cnt,
> +				tmp, new_value);
> +	}
> +#endif
> +}
> +
> +static inline void
> +rte_atomic64_add(rte_atomic64_t *v, int64_t inc)
> +{
> +	__sync_fetch_and_add(&v->cnt, inc);
> +}
> +
> +static inline void
> +rte_atomic64_sub(rte_atomic64_t *v, int64_t dec)
> +{
> +	__sync_fetch_and_sub(&v->cnt, dec);
> +}
> +
> +static inline void
> +rte_atomic64_inc(rte_atomic64_t *v)
> +{
> +	rte_atomic64_add(v, 1);
> +}
> +
> +static inline void
> +rte_atomic64_dec(rte_atomic64_t *v)
> +{
> +	rte_atomic64_sub(v, 1);
> +}
> +
> +static inline int64_t
> +rte_atomic64_add_return(rte_atomic64_t *v, int64_t inc)
> +{
> +	return __sync_add_and_fetch(&v->cnt, inc);
> +}
> +
> +static inline int64_t
> +rte_atomic64_sub_return(rte_atomic64_t *v, int64_t dec)
> +{
> +	return __sync_sub_and_fetch(&v->cnt, dec);
> +}
> +
> +static inline int rte_atomic64_inc_and_test(rte_atomic64_t *v)
> +{
> +	return rte_atomic64_add_return(v, 1) == 0;
> +}
> +
> +static inline int rte_atomic64_dec_and_test(rte_atomic64_t *v)
> +{
> +	return rte_atomic64_sub_return(v, 1) == 0;
> +}
> +
> +static inline int rte_atomic64_test_and_set(rte_atomic64_t *v)
> +{
> +	return rte_atomic64_cmpset((volatile uint64_t *)&v->cnt, 0, 1);
> +}
> +
> +static inline void rte_atomic64_clear(rte_atomic64_t *v)
> +{
> +	rte_atomic64_set(v, 0);
> +}
> +#endif
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_ATOMIC_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_byteorder.h b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> new file mode 100644
> index 0000000..effbd62
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_byteorder.h
> @@ -0,0 +1,146 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014. All rights reserved.
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +/* Inspired from FreeBSD src/sys/powerpc/include/endian.h
> + * Copyright (c) 1987, 1991, 1993
> + * The Regents of the University of California.  All rights reserved.
> +*/
> +
> +#ifndef _RTE_BYTEORDER_ARM32_H_
> +#define _RTE_BYTEORDER_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_byteorder.h"
> +
> +/*
> + * An architecture-optimized byte swap for a 16-bit value.
> + *
> + * Do not use this function directly. The preferred function is rte_bswap16().
> + */
> +static inline uint16_t
> +rte_arch_bswap16(uint16_t _x)
> +{
> +	return __builtin_bswap16(_x);

This has been available since GCC 4.8. Our code uses assembler for
this (so we can use 4.7, with using __sync_* constructs instead of
__atomic_* we can support even 4.6).

http://www.serverphorums.com/read.php?12,641912

Do we want to support GCC 4.6+, or is it sufficient to stay at 4.7+
or even 4.8+?

> +}
> +
> +/*
> + * An architecture-optimized byte swap for a 32-bit value.
> + *
> + * Do not use this function directly. The preferred function is rte_bswap32().
> + */
> +static inline uint32_t
> +rte_arch_bswap32(uint32_t _x)
> +{
> +	return __builtin_bswap32(_x);
> +}
> +
> +/*
> + * An architecture-optimized byte swap for a 64-bit value.
> + *
> +  * Do not use this function directly. The preferred function is rte_bswap64().
> + */
> +/* 64-bit mode */
> +static inline uint64_t
> +rte_arch_bswap64(uint64_t _x)
> +{
> +	return __builtin_bswap64(_x);
> +}
> +
> +#ifndef RTE_FORCE_INTRINSICS
> +#define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
> +			rte_constant_bswap16(x) :		\
> +			rte_arch_bswap16(x)))
> +
> +#define rte_bswap32(x) ((uint32_t)(__builtin_constant_p(x) ?		\
> +			rte_constant_bswap32(x) :		\
> +			rte_arch_bswap32(x)))
> +
> +#define rte_bswap64(x) ((uint64_t)(__builtin_constant_p(x) ?		\
> +			rte_constant_bswap64(x) :		\
> +			rte_arch_bswap64(x)))
> +#else
> +		/*
> +		 * __builtin_bswap16 is only available gcc 4.8 and upwards
> +		 */
> +#if __GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 8)
> +#define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
> +			rte_constant_bswap16(x) :		\
> +			rte_arch_bswap16(x)))
> +#endif
> +#endif
> +
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> +
> +#define rte_cpu_to_le_16(x) (x)
> +#define rte_cpu_to_le_32(x) (x)
> +#define rte_cpu_to_le_64(x) (x)
> +
> +#define rte_cpu_to_be_16(x) rte_bswap16(x)
> +#define rte_cpu_to_be_32(x) rte_bswap32(x)
> +#define rte_cpu_to_be_64(x) rte_bswap64(x)
> +
> +#define rte_le_to_cpu_16(x) (x)
> +#define rte_le_to_cpu_32(x) (x)
> +#define rte_le_to_cpu_64(x) (x)
> +
> +#define rte_be_to_cpu_16(x) rte_bswap16(x)
> +#define rte_be_to_cpu_32(x) rte_bswap32(x)
> +#define rte_be_to_cpu_64(x) rte_bswap64(x)
> +
> +#else /* RTE_BIG_ENDIAN */
> +
> +#define rte_cpu_to_le_16(x) rte_bswap16(x)
> +#define rte_cpu_to_le_32(x) rte_bswap32(x)
> +#define rte_cpu_to_le_64(x) rte_bswap64(x)
> +
> +#define rte_cpu_to_be_16(x) (x)
> +#define rte_cpu_to_be_32(x) (x)
> +#define rte_cpu_to_be_64(x) (x)
> +
> +#define rte_le_to_cpu_16(x) rte_bswap16(x)
> +#define rte_le_to_cpu_32(x) rte_bswap32(x)
> +#define rte_le_to_cpu_64(x) rte_bswap64(x)
> +
> +#define rte_be_to_cpu_16(x) (x)
> +#define rte_be_to_cpu_32(x) (x)
> +#define rte_be_to_cpu_64(x) (x)
> +#endif
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_BYTEORDER_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
> new file mode 100644
> index 0000000..411ca37
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_cpuflags.h
> @@ -0,0 +1,138 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014. All rights reserved.
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_CPUFLAGS_ARM32_H_
> +#define _RTE_CPUFLAGS_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <elf.h>
> +#include <fcntl.h>
> +#include <assert.h>
> +#include <unistd.h>
> +#include <string.h>
> +
> +#include "generic/rte_cpuflags.h"
> +
> +/* Symbolic values for the entries in the auxiliary table */
> +#define AT_HWCAP  16
> +#define AT_HWCAP2 26
> +#define AT_PLATFORM 15
> +
> +/* software based registers */
> +enum cpu_register_t {
> +	REG_HWCAP = 0,
> +	AARCH_MODE,
> +};

Our code can also read REG_HWCAP2 but not the AT_PLATFORM.

> +
> +/**
> + * Enumeration of all CPU features supported
> + */
> +enum rte_cpu_flag_t {
> +	RTE_CPUFLAG_FP = 0,
> +	RTE_CPUFLAG_ASIMD,
> +	RTE_CPUFLAG_EVTSTRM,
> +	RTE_CPUFLAG_AARCH64,
> +	RTE_CPUFLAG_AARCH32,

Nice, you can detect 32/64 bit ARM architecture here. We didn't even
try.

> +	/* The last item */
> +	RTE_CPUFLAG_NUMFLAGS,/**< This should always be the last! */
> +};
> +
> +static const struct feature_entry cpu_feature_table[] = {
> +	FEAT_DEF(FP, 0x00000001, 0, REG_HWCAP,  0)
> +	FEAT_DEF(ASIMD, 0x00000001, 0, REG_HWCAP,  1)
> +	FEAT_DEF(EVTSTRM, 0x00000001, 0, REG_HWCAP,  2)
> +	FEAT_DEF(AARCH64, 0x00000001, 0, AARCH_MODE, 3)
> +	FEAT_DEF(AARCH32, 0x00000001, 0, AARCH_MODE, 4)
> +};
> +
> +/*
> + * Read AUXV software register and get cpu features for Power
> + */
> +static inline void
> +rte_cpu_get_features(__attribute__((unused)) uint32_t leaf,
> +	__attribute__((unused)) uint32_t subleaf, cpuid_registers_t out)
> +{
> +	int auxv_fd;
> +	Elf32_auxv_t auxv;
> +
> +	auxv_fd = open("/proc/self/auxv", O_RDONLY);
> +	assert(auxv_fd);
> +	while (read(auxv_fd, &auxv,
> +		sizeof(Elf32_auxv_t)) == sizeof(Elf32_auxv_t)) {
> +		if (auxv.a_type == AT_HWCAP)
> +			out[REG_HWCAP] = auxv.a_un.a_val;
> +		if (auxv.a_type == AT_PLATFORM)	{
> +			if (strcmp((const char *)auxv.a_un.a_val, "aarch64")
> +					== 0)
> +				out[AARCH_MODE] = (1 << 3);
> +			else if (strcmp((const char *)auxv.a_un.a_val,
> +						"aarch32") == 0)
> +				out[AARCH_MODE] =  (1 << 4);

Why there are such strange values 1 << 3 and 1 << 4?

> +		}
> +	}
> +}
> +
> +/*
> + * Checks if a particular flag is available on current machine.
> + */
> +static inline int
> +rte_cpu_get_flag_enabled(enum rte_cpu_flag_t feature)
> +{
> +	const struct feature_entry *feat;
> +	cpuid_registers_t regs = {0};
> +
> +	if (feature >= RTE_CPUFLAG_NUMFLAGS)
> +		/* Flag does not match anything in the feature tables */
> +		return -ENOENT;
> +
> +	feat = &cpu_feature_table[feature];
> +
> +	if (!feat->leaf)
> +		/* This entry in the table wasn't filled out! */
> +		return -EFAULT;
> +
> +	/* get the cpuid leaf containing the desired feature */
> +	rte_cpu_get_features(feat->leaf, feat->subleaf, regs);
> +
> +	/* check if the feature is enabled */
> +	return (regs[feat->reg] >> feat->bit) & 1;
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_CPUFLAGS_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_cycles.h b/lib/librte_eal/common/include/arch/arm/rte_cycles.h
> new file mode 100644
> index 0000000..140d1bb
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_cycles.h
> @@ -0,0 +1,77 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014. All rights reserved.
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> [...]
> + */
> +
> +#ifndef _RTE_CYCLES_ARM32_H_
> +#define _RTE_CYCLES_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_cycles.h"
> +
> +/**
> + * Read the time base register.
> + *
> + * @return
> + *   The time base for this lcore.
> + */
> +static inline uint64_t
> +rte_rdtsc(void)
> +{
> +	unsigned tsc;
> +	uint64_t final_tsc;
> +
> +	/* Read PMCCNTR */
> +	asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r"(tsc));
> +	/* 1 tick = 64 clocks */
> +	final_tsc = ((uint64_t)tsc) << 6;
> +
> +	return (uint64_t)final_tsc;
> +}

Are you sure, the PMCCNTR is not used by the Linux Kernel? Would we
break something there? At least, the

 arch/arm/kernel/perf_event_v7.c

does use the counter and it has routines for both reading and writing
it. So, I am afraid that we can break perf by this approach...

Our one, not the very best...

+static inline uint64_t
+rte_rdtsc(void)
+{
+	struct timespec val;
+	uint64_t v;
+
+	while (clock_gettime(CLOCK_MONOTONIC_RAW, &val) != 0)
+		/* no body */;
+
+	v  = (uint64_t) val.tv_sec * 1000000000LL;
+	v += (uint64_t) val.tv_nsec;
+	return v;
+}

We can make it configurable. If somebody doesn't care about the perf
support but likes the lower-latency PMU counter, it might be possible
to switch for another implementation.

> +
> +static inline uint64_t
> +rte_rdtsc_precise(void)
> +{
> +	asm volatile("isb sy" : : : );
> +	return rte_rdtsc();
> +}
> +
> +static inline uint64_t
> +rte_get_tsc_cycles(void) { return rte_rdtsc(); }
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_CYCLES_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
> new file mode 100644
> index 0000000..341052e
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
> @@ -0,0 +1,101 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014. All rights reserved.
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_MEMCPY_ARM32_H_
> +#define _RTE_MEMCPY_ARM32_H_
> +
> +#include <stdint.h>
> +#include <string.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_memcpy.h"
> +
> +
> +
> +static inline void
> +rte_mov16(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 16);
> +}
> +
> +static inline void
> +rte_mov32(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 32);
> +}
> +
> +static inline void
> +rte_mov48(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 48);
> +}
> +
> +static inline void
> +rte_mov64(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 64);
> +}
> +
> +static inline void
> +rte_mov128(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 128);
> +}
> +
> +static inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 256);
> +}
> +
> +static inline void *
> +rte_memcpy(void *dst, const void *src, size_t n)
> +{
> +	return memcpy(dst, src, n);
> +}
> +
> +
> +static inline void *
> +rte_memcpy_func(void *dst, const void *src, size_t n)
> +{
> +	return memcpy(dst, src, n);
> +}

Here we do better with the NEON optimizations. However, I am not quite
sure how to behave on architectures _without_ NEON. Second, if you look
at the statistics we made (http://dpdk.org/dev/patchwork/patch/7389/)
you can see that the optimization makes sense for Cortex-A15 only.
For the other cores (A7, A9), it helps for lengths <= 32 B.

Should we do some runtime check for NEON presence?
Should we do some runtime check for the current ARM machine type?
Should we optimize only for the specific lengths?

> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_MEMCPY_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_prefetch.h b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
> new file mode 100644
> index 0000000..47be3b8
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_prefetch.h
> @@ -0,0 +1,64 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_PREFETCH_ARM32_H_
> +#define _RTE_PREFETCH_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_prefetch.h"
> +
> +static inline void
> +rte_prefetch0(const volatile void __attribute__((unused)) *p)
> +{
> +	asm volatile("nop");
> +}
> +
> +static inline void
> +rte_prefetch1(const volatile void __attribute__((unused)) *p)
> +{
> +	asm volatile("nop");
> +}
> +
> +static inline void
> +rte_prefetch2(const volatile void __attribute__((unused)) *p)
> +{
> +	asm volatile("nop");
> +}

We use PLD instruction here. Probably, we can use __builtin_prefetch.

> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_PREFETCH_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_rwlock.h b/lib/librte_eal/common/include/arch/arm/rte_rwlock.h
> new file mode 100644
> index 0000000..87349b9
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_rwlock.h
> @@ -0,0 +1,70 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_RWLOCK_ARM32_H_
> +#define _RTE_RWLOCK_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_rwlock.h"
> +
> +static inline void
> +rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
> +{
> +	rte_rwlock_read_lock(rwl);
> +}
> +
> +static inline void
> +rte_rwlock_read_unlock_tm(rte_rwlock_t *rwl)
> +{
> +	rte_rwlock_read_unlock(rwl);
> +}
> +
> +static inline void
> +rte_rwlock_write_lock_tm(rte_rwlock_t *rwl)
> +{
> +	rte_rwlock_write_lock(rwl);
> +}
> +
> +static inline void
> +rte_rwlock_write_unlock_tm(rte_rwlock_t *rwl)
> +{
> +	rte_rwlock_write_unlock(rwl);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_RWLOCK_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> new file mode 100644
> index 0000000..45d01a0
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> @@ -0,0 +1,116 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014. All rights reserved.
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_SPINLOCK_ARM32_H_
> +#define _RTE_SPINLOCK_ARM32_H_
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_common.h>
> +
> +#include "generic/rte_spinlock.h"
> +
> +
> +#ifndef RTE_FORCE_INTRINSICS
> +
> +static inline void
> +rte_spinlock_lock(rte_spinlock_t *sl)
> +{
> +	while (__sync_lock_test_and_set(&sl->locked, 1))
> +		while (sl->locked)
> +			rte_pause();
> +}
> +
> +static inline void
> +rte_spinlock_unlock(rte_spinlock_t *sl)
> +{
> +	__sync_lock_release(&sl->locked);
> +}
> +
> +static inline int
> +rte_spinlock_trylock(rte_spinlock_t *sl)
> +{
> +	return (__sync_lock_test_and_set(&sl->locked, 1) == 0);
> +}
> +
> +#endif
> +
> +static inline int
> +rte_tm_supported(void)
> +{
> +	return 0;
> +}
> +
> +static inline void
> +rte_spinlock_lock_tm(rte_spinlock_t *sl)
> +{
> +	rte_spinlock_lock(sl); /* fall-back */
> +}
> +
> +static inline int
> +rte_spinlock_trylock_tm(rte_spinlock_t *sl)
> +{
> +	return rte_spinlock_trylock(sl);
> +}
> +
> +static inline void
> +rte_spinlock_unlock_tm(rte_spinlock_t *sl)
> +{
> +	rte_spinlock_unlock(sl);
> +}
> +
> +static inline void
> +rte_spinlock_recursive_lock_tm(rte_spinlock_recursive_t *slr)
> +{
> +	rte_spinlock_recursive_lock(slr); /* fall-back */
> +}
> +
> +static inline void
> +rte_spinlock_recursive_unlock_tm(rte_spinlock_recursive_t *slr)
> +{
> +	rte_spinlock_recursive_unlock(slr);
> +}
> +
> +static inline int
> +rte_spinlock_recursive_trylock_tm(rte_spinlock_recursive_t *slr)
> +{
> +	return rte_spinlock_recursive_trylock(slr);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_SPINLOCK_ARM32_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> new file mode 100644
> index 0000000..5994efd
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h
> @@ -0,0 +1,37 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> +#ifndef _RTE_VECT_ARM32_H_
> +#define _RTE_VECT_ARM32_H_
> +
> +
> +#endif /* _RTE_VECT_ARM32_H_*/
> diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile
> index d9c5233..549f2ed 100644
> --- a/lib/librte_eal/linuxapp/Makefile
> +++ b/lib/librte_eal/linuxapp/Makefile
> @@ -35,6 +35,9 @@ ifeq ($(CONFIG_RTE_EAL_IGB_UIO),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += igb_uio
>  endif
>  DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal
> +ifeq ($(CONFIG_RTE_ARCH), "arm")
> +DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += arm_pmu
> +endif
>  ifeq ($(CONFIG_RTE_KNI_KMOD),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni
>  endif
> diff --git a/lib/librte_eal/linuxapp/arm_pmu/Makefile b/lib/librte_eal/linuxapp/arm_pmu/Makefile
> new file mode 100755
> index 0000000..0adb23b
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/arm_pmu/Makefile
> @@ -0,0 +1,52 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2015 Intel Corporation. All rights reserved.
> +#
> [...]
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# module name and path
> +#
> +MODULE = arm_pmu
> +MODULE_PATH = drivers/net/arm_pmu
> +
> +#
> +# CFLAGS
> +#
> +MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=100
> +MODULE_CFLAGS += -I$(RTE_OUTPUT)/include
> +MODULE_CFLAGS += -Winline -Wall -Werror
> +MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-y := rte_enable_pmu.c
> +
> +include $(RTE_SDK)/mk/rte.module.mk
> diff --git a/lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c b/lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c
> new file mode 100644
> index 0000000..9d77271
> --- /dev/null
> +++ b/lib/librte_eal/linuxapp/arm_pmu/rte_enable_pmu.c
> @@ -0,0 +1,83 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2015 Intel Corporation. All rights reserved.
> + *
> [...]
> + */
> +
> + /**
> +  * @file
> +  * This file enables the ARM PMU.
> +  *
> +  * It contains the code to enable the usage of PMU registers
> +  * by application code on all cores.
> +  */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/uio_driver.h>
> +#include <linux/io.h>
> +#include <linux/msi.h>
> +#include <linux/version.h>
> +#include <linux/slab.h>
> +
> +void enable_pmu(void __attribute__((unused)) *info)
> +{
> +	int core_id = 0;
> +
> +	core_id = smp_processor_id();
> +	/* Configure PMUSERENR - Enables user mode to have access to the
> +	 * Performance Monitor Registers. */
> +	asm volatile("mcr p15, 0, %0, c9, c14, 0" : : "r"(1));
> +	/* Configure PMCR */
> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r"(29));
> +	/* Configure PMCNTENSET Enable counters */
> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r"(0x8000000f));
> +	return;
> +}
> +
> +int __init init_module_set_arm_pmu(void)
> +{
> +	int ret = 0;
> +	/* smp_call_function will make sure that the registers
> +	 * for all the cores are enabled for tracking their PMUs
> +	 * This is required for the rdtsc equivalent on ARMv7 */
> +	ret = smp_call_function(&enable_pmu, NULL, 0);
> +	/* smp_call_function would only call on all other modules
> +	 * so call it for the current module */
> +	enable_pmu(NULL);
> +	return ret;
> +}
> +
> +void __exit cleanup_module_set_arm_pmu(void)
> +{
> +}
> +
> +module_init(init_module_set_arm_pmu);
> +module_exit(cleanup_module_set_arm_pmu);

Nice, but as I've already stated, setting the PMU can break the perf.
Otherwise, it is quite nice solution. Anyway, a kernel module will be
necessary for the ARM machines utilizing the internal EMACs. We have
to deal with I/O coherency, phyters and clocks control.

The I/O coherency might not be an issue for Cortex-A15 (and newer) as
there is a cache-coherent interconnect available (using AMBA ACE
extension). However, I didn't find out, which SoCs integrates this
interconnect. For older SoCs, we have to deal with this.

I am also thinking of using the non-cachable memory only, at least at
the beginning. This needs to set the proper MMU/TLB flags to the
allocated huge pages. The processing will be slow this way, but there
will be no need for other system calls to the kernel during
sending/receiving packets. I've measured that accessing only few bytes
(say 128-512 B) in non-cacheable areas does not impact the processing
that much, so it could be sufficient for tasks like L2/L3 forwarding,
etc. But, I have no idea, how to properly access the MMU/TLB settings of
the selected (or all?) huge pages. This will be a hack into the
Linux Kernel.

> diff --git a/mk/arch/arm/rte.vars.mk b/mk/arch/arm/rte.vars.mk
> new file mode 100644
> index 0000000..3dfeb16
> --- /dev/null
> +++ b/mk/arch/arm/rte.vars.mk
> @@ -0,0 +1,58 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2015 Intel Corporation. All rights reserved.
> +#
> [...]
> +#
> +# arch:
> +#
> +#   - define ARCH variable (overriden by cmdline or by previous
> +#     optional define in machine .mk)
> +#   - define CROSS variable (overriden by cmdline or previous define
> +#     in machine .mk)
> +#   - define CPU_CFLAGS variable (overriden by cmdline or previous
> +#     define in machine .mk)
> +#   - define CPU_LDFLAGS variable (overriden by cmdline or previous
> +#     define in machine .mk)
> +#   - define CPU_ASFLAGS variable (overriden by cmdline or previous
> +#     define in machine .mk)
> +#   - may override any previously defined variable
> +#
> +# examples for CONFIG_RTE_ARCH: i686, x86_64, x86_64_32
> +#
> +
> +ARCH  ?= arm
> +# common arch dir in eal headers
> +ARCH_DIR := arm
> +CROSS ?=
> +
> +CPU_CFLAGS  += -flax-vector-conversions

https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
-flax-vector-conversions
    Allow implicit conversions between vectors with differing numbers of elements and/or incompatible element types.
    This option should not be used for new code.

I have no idea what is this option about... is it necessary?

> +CPU_LDFLAGS ?=
> +CPU_ASFLAGS ?=

We do:

+CPU_CFLAGS  ?= -marm -DRTE_CACHE_LINE_SIZE=64 -munaligned-access
+CPU_LDFLAGS ?=
+CPU_ASFLAGS ?= -felf

The -marm can be probably omitted as the machine specific flags
override it anyway.

How is the cache line size specified for other platforms? Is it OK to go
this way?

https://gcc.gnu.org/onlinedocs/gcc-4.8.1/gcc/ARM-Options.html#ARM-Options
-munaligned-access
-mno-unaligned-access
    Enables (or disables) reading and writing of 16- and 32- bit values from addresses
    that are not 16- or 32- bit aligned. By default unaligned access is disabled for all
    pre-ARMv6 and all ARMv6-M architectures, and enabled for all other architectures. If
    unaligned access is not enabled then words in packed data structures will be accessed
    a byte at a time.

    The ARM attribute Tag_CPU_unaligned_access will be set in the generated object file to
    either true or false, depending upon the setting of this option. If unaligned access is
    enabled then the preprocessor symbol __ARM_FEATURE_UNALIGNED will also be defined. 

> +
> +export ARCH CROSS CPU_CFLAGS CPU_LDFLAGS CPU_ASFLAGS
> diff --git a/mk/machine/armv7-a/rte.vars.mk b/mk/machine/armv7-a/rte.vars.mk
> new file mode 100644
> index 0000000..feeebfa
> --- /dev/null
> +++ b/mk/machine/armv7-a/rte.vars.mk
> @@ -0,0 +1,63 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2015 Intel Corporation. All rights reserved.
> +#
> [...]
> +
> +#
> +# machine:
> +#
> +#   - can define ARCH variable (overriden by cmdline value)
> +#   - can define CROSS variable (overriden by cmdline value)
> +#   - define MACHINE_CFLAGS variable (overriden by cmdline value)
> +#   - define MACHINE_LDFLAGS variable (overriden by cmdline value)
> +#   - define MACHINE_ASFLAGS variable (overriden by cmdline value)
> +#   - can define CPU_CFLAGS variable (overriden by cmdline value) that
> +#     overrides the one defined in arch.
> +#   - can define CPU_LDFLAGS variable (overriden by cmdline value) that
> +#     overrides the one defined in arch.
> +#   - can define CPU_ASFLAGS variable (overriden by cmdline value) that
> +#     overrides the one defined in arch.
> +#   - may override any previously defined variable
> +#
> +
> +# ARCH =
> +# CROSS =
> +# MACHINE_CFLAGS =
> +# MACHINE_LDFLAGS =
> +# MACHINE_ASFLAGS =
> +# CPU_CFLAGS =
> +# CPU_LDFLAGS =
> +# CPU_ASFLAGS =
> +
> +MACHINE_CFLAGS += -march=armv7-a

We do:

+CPU_CFLAGS += -mfloat-abi=softfp
+
+MACHINE_CFLAGS  = -march=armv7-a -mtune=cortex-a9
+MACHINE_CFLAGS += -mfpu=neon

This is quite specific. How to deal with those flags? Should we have a
specific config for Cortex A7, A9, A15, ...? What about NEON?

> +
> +#Tried this for gdb - did not work - Dwarf version error seen
> +# we need a gdb that can understand dwarf version 4
> +#CPU_CFLAGS += -g -gdwarf-2 -gstrict-dwarf
> +#CPU_CFLAGS := $(filter-out -O3,$(CPU_CFLAGS))
> +#CPU_CFLAGS := $(filter-out -O2,$(CPU_CFLAGS))

I can see, those lines are commented out. However, no idea about those,
debugging should not be enabled by default, is it true? Is there a way
in DPDK how to enable debugging at the compiler level?

I do not understand what is the goal of the -O2/-O3 lines...

> diff --git a/mk/toolchain/gcc/rte.vars.mk b/mk/toolchain/gcc/rte.vars.mk
> index 0f51c66..501fce5 100644
> --- a/mk/toolchain/gcc/rte.vars.mk
> +++ b/mk/toolchain/gcc/rte.vars.mk
> @@ -1,7 +1,6 @@
>  #   BSD LICENSE
>  #
> -#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> -#   All rights reserved.
> +#   Copyright(c) 2010-2015 Intel Corporation. All rights reserved.
>  #
>  #   Redistribution and use in source and binary forms, with or without
>  #   modification, are permitted provided that the following conditions
> @@ -73,7 +72,10 @@ endif
>  
>  WERROR_FLAGS := -W -Wall -Werror -Wstrict-prototypes -Wmissing-prototypes
>  WERROR_FLAGS += -Wmissing-declarations -Wold-style-definition -Wpointer-arith
> -WERROR_FLAGS += -Wcast-align -Wnested-externs -Wcast-qual
> +ifneq ($(CONFIG_RTE_ARCH), "arm")
> +WERROR_FLAGS += -Wcast-align
> +endif
> +WERROR_FLAGS += -Wnested-externs -Wcast-qual
>  WERROR_FLAGS += -Wformat-nonliteral -Wformat-security
>  WERROR_FLAGS += -Wundef -Wwrite-strings
>  

When you start compiling the libraries, the -Wcast-align will break
the build (at least, it does for us).

Here is, how we do after a short discussion with Thomas M.

+# There are many issues reported for ARMv7 architecture
+# which are not necessarily fatal. Report as warnings.
+ifeq ($(CONFIG_RTE_ARCH_ARMv7),y)
+WERROR_FLAGS += -Wno-error
+endif

We should identify the alignment issues and fix them somehow(?).

-- 
  Jan Viktorin                E-mail: Viktorin at RehiveTech.com
  System Architect            Web:    www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic


More information about the dev mailing list