Community
Participate
Working Groups
Build Identifier: 0.7.0 Typically, an agent calculates some of its properties in relation to time. Example: a lattice cell grows older with every time step. Now, if this behaviour cannot be represented by a function in relation to time, you need to look up the data in relation to time in a table. Example: the stock market index over the past three years. So we need some kind of data container where values can be looked up with a given time. An additional requirement concerns a time step value that has no data representation in the lookup table. In such a case, the last defined value in the lookup table should be used instead. Time step values in the lookup table should be represented by integers. Reproducible: Always
Such a thing exists already, though it could certainly be improved or even replaced -- right now the actual data handling is from a port from Ascape implementation. See org.eclipse.amp.agf.chart.IDataProvider for an example of how data input is provided for consumption into the container. Then for example BasicAxesChartStrategy for a usage. But as you anticipate there isn't a mechanism to say "get the value for this property at time t." Instead you get all values as a list and the presumption is that the item at index t will be for time t so instead you do "get the List of values for property x" and then foreach item in that list get the value. This fits in with how charting APIs expect to see the data, which is as a set of data points at some granularity. regarding "In such a case, the last defined value in the lookup table should be used instead" would there be any problem with doing this on the data selection side, i.e. where ever there is a gap, one simply creates a new data point that keeps the value of t - 1? This would allows the API contracts to remain pretty simple. To open up discussion a bit more, I note right now there is actually a total bias toward time-step. I wonder if we do want to generalize the interface and services so that it works equally well for O(1) cases such as a pure functional form, as it does for the time step case. For example, the current form assumes that there would be a value for each and every time step, but in the case of a functional form it might make more sense to have an interface like getValue(DataMeasure measure, long t) that a given implementation might use an indexed value form a list for or might simply calculate on request. But I don't know, that might be unnecessarily complicated. It certainly isn't necessary for the ABM case. By the way, on the time index, that should probably be a long. That's how we're going to be representing them at the model level. The reason you want that level of precision is that that allows the highest possible level of time granularity. For example, I might want to add some behavior "in between" some existing behavior. Some ABM implementations use floating points for that reason but for what I think are pretty clear reasons I wouldn't want to do that here.
As far as I know, the org.eclipse.amp.agf.chart.IDataProvider is the one that provides Charts with data points. So basically, this is the data container for the simulation results. In Ascape the simulation results for a given model are stored in a org.ascape.util.data.DataSelection. Please correct me if I'm wrong. Now, to me it looks like DataSelection is probably the kind of data container I was talking about. But I'm not sure if there's a misunderstanding about where and when this data container will be used. Let me be more precise about this. I need to model an agent with a behaviour I can neither describe with rules nor a mathematical function. But I know at the given time t1, it has an attribute with a value of v1. And probably at the given time t4 the value of v4. So at the times t2 and t3 I don't know anything about its value. I can only guess. So this requirement is not about data being plotted. It's about data being needed at simulation time and being observed by other agents.
Ahah! The light dawns. As you note, I was thinking about this at the aggregate (statistical) level of granularity. So basically what we need is a longitudinal data structure for each attribute for each and every agent at time t with m granularity, so in simplest terms (wouldn't actually be implementedd this way) like a 3D array: double[agent.size, attribute.count, time]. There would be a one to one correspondence between a model scape (the term for a collection of agents with same type, behavior and containment) and a given Agent DataSet. Note that as I think I may have mentioned one of the novel things about the Ascape -> AMF approach is that the attributes of agents at the micro-level become the input parameterizations / output statistics at the macro level. This means that a given model might have many DataSets. For example a model of pandemics with a Regional, City and Individual hierarchy might have: 1. A DataSet with population one for aggregate data such as "Average Exposures per City". 2. A DataSets with population c for Cities with data such as "Count Exposures". 3. c DataSets for Individuals in Cities with data such as "Status == {Succeptible, Exposed..} ". We could combine c into one dataset so that all agents of a type are kept in the same data set. There are advantages and disadvantages to that. So this has been contemplated and as you say it it in some sense a continuation of the DataSelection mechanism in Ascape. I developed that over ten years ago and it could definitely be improved but it has the advantage of working now. :) We haven't been keeping disaggregated data because it is a *lot* of data. :) But there are really cool things you could do with this, like move agents backward in time (without resorting to rerunning the model), look at sub sets of agents trajectories over time, etc.. Major advances in tools are possible based on the infrastructure. My thinking on this have been to support a full EMF data model for instantiations of the model. Originally I had thought of actually creating a matching model for each Acore model by mapping from Acore -> Ecore and then using dynamic Ecore (slower) or generating code (technical challenges). In that case we would have a model like: EClass Individual -EDouble Probability Transmission -EEnum Infection Status But lately I've been thinking that it might be a lot simpler and more general to use the meta-data from the existing Acore model so that you would have a generic mdoel like below (not well thought out, just an example) EClass AgentDataSet -0..* EAgentEntry agents EClass EAgentEntry -0..* EAttributeValueSet attributeValueSets EClass EAttributeValueSet -SAttribute modelAttribute -0..* EDouble values This is actually pretty similar to the adata model which already exists for aggregate data so it might be less work than it seems. It might seem like overkill to use EMF but there are huge advantages to doing so. I think we'd just have to put off the idea of not keeping the data for all sets for now. Of course you'd want runtime controls for turning off collection of all of the data. Does this seem closer to what you had in mind?
Yes, that's what I'm talking about! :) At least the first part is. A discussion about the second part of your comment should be held in its own bug report (or feature request), I think. It's a very interesting issue on its own and I do have some requirements there too! But let me get back to the first part... Some thoughts about an agent's attribute of the type "lookup table": The value of this attribute can be retrieved by providing a key. During the simulation, the key could be the current time step value. But in fact, it should be able to be anything: an ID, a number, a String, or even better: any type of key! Besides, the value that will be returned by the lookup table should not be limited to the type of number or String. Actually, it should be able to provide any kind of value! So let's resume. I need a special kind of attribute that will return any type of value when being provided with a key of any type. And now I haven't even talked about the different strategies of handling keys that don't have a value. Of course, I'd like to implement any kind of strategy. This leads me to the conclusion that my requirement should be rephrased: An Agent nowadays can have an attribute of a given type. Available types are Integer, Real, Symbol, Undefined and Numeric. I'd like to be able to extend this list of types by adding my custom type. Let's call it "LookupTable". What if my lookup table could be added by using an extension point (let's call it "agent.attribute.type")? That way any kind of type could be introduced for an attribute. Not only a lookup table. Do you think this is too unspecific?
OK, now I'm a bit confused again.. :) (In reply to comment #4) > Some thoughts about an agent's attribute of the type "lookup table": Perhaps we should distinguish between: 1. Modeling Time: MetaABM/Acore meta-model. 2. Runtime: e.g. model in memory then perhaps persisted to a data model. (I realize that for SD models, these distinctions aren't quite as black and white..) So I wasn't imagining this as an agent attribute (1), but as data that would be collected into a table when the model was run (2). Am I still missing what you're getting at? My guiding thinking is that there is an essential model / view point of view here, so that for example there is no way for an agent to find out what it did at iteration 10 when that agent is at step 100 -- unless the Agent has actually defined a memory for itself. But that's too dogmatic and I think at some point we should have time functions that could give access to that kind of state. I think for a more general purpose approach that would be very useful. > The value of this attribute can be retrieved by providing a key. During the > simulation, the key could be the current time step value. But in fact, it > should be able to be anything: an ID, a number, a String, or even better: any > type of key! > Besides, the value that will be returned by the lookup table should not be > limited to the type of number or String. Actually, it should be able to provide > any kind of value! > > So let's resume. I need a special kind of attribute that will return any type > of value when being provided with a key of any type. And now I haven't even > talked about the different strategies of handling keys that don't have a value. > Of course, I'd like to implement any kind of strategy. So it sounds like what you are looking for here is a Map? I think that would make sense as a generic data type. In fact I think that we might want to refactor the current design that has SAttributeArray as a subclass of SAttribute. We should probably support Sets and Stacks and maybe Maps as well for basic types, but..see below. > This leads me to the conclusion that my requirement should be rephrased: > > An Agent nowadays can have an attribute of a given type. Available types are > Integer, Real, Symbol, Undefined and Numeric. I'd like to be able to extend > this list of types by adding my custom type. Let's call it "LookupTable". > What if my lookup table could be added by using an extension point (let's call > it "agent.attribute.type")? That way any kind of type could be introduced for > an attribute. Not only a lookup table. Do you think this is too unspecific? It is an interesting idea -- my immediate but not necessarily correct! -- reaction is that agents should only be able to have primitive values. That is because we want to have very general mathematical type constructs. This is because what we don't want is a general purpose OO representation using attributes per se. That is, references are not made through attributes, but through Graphs. Future versions of Acore will have even more clear support for that, so that for example you can create a Graph relation directly from an agent and those relations will always be "typed" to particular SAgents. I think that you're not really asking for this level of generality but just to through it out there. And it leads me to the other point which is that MetaABM may support a lot of what you are talking about already. A Map is just a graph relation really. So in the case where you want to create an arbitrary relation between any sets of agents, that's how you should handle it, by creating removing and replacing Connections. ANd then you don't need a lookup table and all of the search mechanisms for that because we already have them! You just do a query. Also remember that Agents do not have to represent an actual "agent" -- they can be any kind of data structure..which hmm..makes me think. I had been planning to implement the data stuff as an EMF Ecore model, but maybe they should be an Acore model! In other words, you might have an Agent who's job it is to record the state of other agent's values. This is just the kind of god food eating that ABM designers have always favored going back to Swarm. Really intriguing discussion -- I don't want to lose the thread about keeping Runtime data for agents no matter how this bug turns out..
(In reply to comment #5) > This is just the kind of god food eating that ABM designers have always favored going back to Swarm. ^^^ dog Now there's a revealing typo!
(In reply to comment #5) > Perhaps we should distinguish between: > > 1. Modeling Time: MetaABM/Acore meta-model. > 2. Runtime: e.g. model in memory then perhaps persisted to a data model. > (I realize that for SD models, these distinctions aren't quite as black and > white..) > > So I wasn't imagining this as an agent attribute (1), but as data that would be > collected into a table when the model was run (2). Am I still missing what > you're getting at? Yes, I think we're still talking about two different things here. Good idea to distinguish. That will help explaining! :) I really am talking about a case (1) at modelling time. So probably my comment #2 and comment #4 make more sense from this point of view. Hmm.. could you read over those again in respect of being at modelling time (1)? Please let me know if that makes it any clearer. To my understanding, most part of your comment #6 is about simulation time (2). So we should probably open another bug to talk about that case (2).
(In reply to comment #7) > I really am talking about a case (1) at modelling time. So probably my comment > #2 and comment #4 make more sense from this point of view. Hmm.. could you read > over those again in respect of being at modelling time (1)? Please let me know > if that makes it any clearer. OK. It's still possible to be confused because of course we need to define at runtime something that we will be using at simulation time. As I mention it is possible that we might also use this in the same way that we would use other agent state for recording data. But it sounds to me like a good distinction is between "Lookup table *for* agents (for them to use)" vs. "Lookup table *on* agents (for observers to use)". I think that perhaps we need a concrete example to work through?
(In reply to comment #8) > I think that perhaps we need a concrete example to work through? Good idea. Lets assume I want to simulate how an anthill is being built by its ants. Every ant will be represented by an agent. The biggest influence comes from the weather. If its warm and dry, the ants will work a lot faster. Now, I've recorded temperature and rainfall over the last year. For every day there's a number that represents the average temperature of that day and another number that stands for the amount of rain that fell that day. Instead of using random data for temperature and rainfall, I want exactly this data to be the basis of my simulation. So the question is: How does the weather data come into my simulation? Could I use an agent that represents the temperature and another to represent rainfall? How would they both store their data? How would they be able to "answer" when being asked (at simulation time) about the temperature or the rainfall at the given time?
(In reply to comment #9) > (In reply to comment #8) > How does the weather data come into my simulation? > > Could I use an agent that represents the temperature and another to represent > rainfall? How would they both store their data? How would they be able to > "answer" when being asked (at simulation time) about the temperature or the > rainfall at the given time? Ah, now the light truly dawns! :) And actually this is sort of the inverse issue of the disaggregate data output. This is why it makes ultimate sense I think to have a sort of two way mapping between agents <-> data.. then we don't need to distinguish between input, output and running model -- we just need various mappings (through XText and EMF persistence) to various artifacts and a way to control when those mappings get triggered. That's hand waving so let's get into implementation. This has been a common thing to want for ABM models for some time. In fact the very first model I worked on "Artificial Anasazi" -- an archeology model -- had data that was generated externally for historical household locations, local climatology, etc.. for 800 years. But its usually been awkward and hand written. BUt rather than have an explicit lookup table that agents are aware of, I would like to think about how we could get the mapping to work. First, yes, there would be an Agent that could take rainfall and temperature. That's the function of the root Scape (Context) -- all of the "world" level data goes there. The nice thing about that is that say you have two ant hills with different micro-climates..you can make the model handle that without changing any of the model design. So the real issue is how to get the data from the source to the agent(s). The simplest way to handle this would be through a file reader mechanism. We need higher level support for that anyway. Here the sort of lamest thing would be like a "Read Next Line" action that would get a value that we would use with a Set action. But the deeper way to get at this would be to recognize that some things are exogenous and some are endogenous and have a way to mix and match the two. So that the persistence and execution mechanisms are actually somehow interwoven. Before exploring that further, does this make sense to you so far?
(In reply to comment #10) Yes, yes, yes. This is exactly what I'm talking about! :) > Ah, now the light truly dawns! :) And actually this is sort of the inverse > issue of the disaggregate data output. This is why it makes ultimate sense I > think to have a sort of two way mapping between agents <-> data.. then we don't > need to distinguish between input, output and running model -- we just need > various mappings (through XText and EMF persistence) to various artifacts and a > way to control when those mappings get triggered. That's hand waving so let's > get into implementation. I like the idea of using the same mapping twice. And I also think that holding mapped data in files instead of in RAM is a good idea as well. That way we won't run into memory issues as quickly. > This has been a common thing to want for ABM models for some time. In fact the > very first model I worked on "Artificial Anasazi" -- an archeology model -- had > data that was generated externally for historical household locations, local > climatology, etc.. for 800 years. But its usually been awkward and hand > written. BUt rather than have an explicit lookup table that agents are aware > of, I would like to think about how we could get the mapping to work. > > First, yes, there would be an Agent that could take rainfall and temperature. > That's the function of the root Scape (Context) -- all of the "world" level > data goes there. The nice thing about that is that say you have two ant hills > with different micro-climates..you can make the model handle that without > changing any of the model design. > > So the real issue is how to get the data from the source to the agent(s). The > simplest way to handle this would be through a file reader mechanism. We need > higher level support for that anyway. Here the sort of lamest thing would be > like a "Read Next Line" action that would get a value that we would use with a > Set action. > > But the deeper way to get at this would be to recognize that some things are > exogenous and some are endogenous and have a way to mix and match the two. So > that the persistence and execution mechanisms are actually somehow interwoven. > Before exploring that further, does this make sense to you so far? So what you are saying is that technically, you don't want to distinguish between four cases: 1. Simulated data is being mapped and written into a file. 2. Simulated data could be read out of the file to review a simulation result without having to simulate it again. 3. "Historical data" (like the climatology in our examples) will be written into a file during modelling time. 4. Historical data can be read out of the file to be used during simulation. Actually, case 1 and 3 are almost identical since they are both writing data. And so are case 2 and 4 in respect of data being read. To use the same solution for these cases makes perfect sense because they are all dealing with the same problem. Correct? I'm quite excited about this idea. I think it's a beautiful way to bring the problem to an abstract level. I still have I case I want to share with you because I can't see yet how this will work with the solution mentioned above. Rather than having historical weather data where there only a single value (for temperature, rainfall, etc.) at every given time I'd like to think of another example: I want to simulate airport traffic. I have historical data about events having occured such as airplane lift-offs. At a given time a large airport could have two airplanes starting at the exact same time. So at the given time I need to have multiple events recorded, each with a set of data (name of the flight, destination, passengers, etc.). I can't think of how this would exactly work with the solution we discussed (historical data as a function of the root Scape). I'm quite sure you could solve this problem by using a different model. But I don't see how. Any idea?
Hi Jonas, It seems that this conversation is moving around in synch with the sun. Its getting to be time to go home here and I wish I could be working on this stuff all of the time instead of dealing with build issues like I have for the past four days :( -- what AMP needs is a good build engineer, because I'm not one -- but anyway here are my comments for the day. (In reply to comment #11) > I like the idea of using the same mapping twice. And I also think that holding > mapped data in files instead of in RAM is a good idea as well. That way we > won't run into memory issues as quickly. You should actually be able to do it either way. The nice thing about having it in an EMF file is that you can handle persistence any way you want. That said, you would want to have support for not keeping anything in EMF and having it all in memory. The beauty of code generation is that you can do that without any overhead at all! > So what you are saying is that technically, you don't want to distinguish > between four cases: > 1. Simulated data is being mapped and written into a file. > 2. Simulated data could be read out of the file to review a simulation result > without having to simulate it again. > 3. "Historical data" (like the climatology in our examples) will be written > into a file during modelling time. > 4. Historical data can be read out of the file to be used during simulation. Exactly, or at least you want to be able to treat them as if they had no distinctions. > Actually, case 1 and 3 are almost identical since they are both writing data. > And so are case 2 and 4 in respect of data being read. Right, they're sort of inverts of each other. > To use the same solution for these cases makes perfect sense because they are > all dealing with the same problem. Correct? Yes! At least I think so. At least its something worth aiming at. > I'm quite excited about this idea. I think it's a beautiful way to bring the > problem to an abstract level. I still have I case I want to share with you > because I can't see yet how this will work with the solution mentioned above. :D > Rather than having historical weather data where there only a single value (for > temperature, rainfall, etc.) at every given time I'd like to think of another > example: > I want to simulate airport traffic. I have historical data about events having > occured such as airplane lift-offs. At a given time a large airport could have > two airplanes starting at the exact same time. So at the given time I need to > have multiple events recorded, each with a set of data (name of the flight, > destination, passengers, etc.). > > I can't think of how this would exactly work with the solution we discussed > (historical data as a function of the root Scape). I'm quite sure you could > solve this problem by using a different model. But I don't see how. Any idea? You've hit exactly the challenging part of it. So we need to be able to do the above with agent construction destruction and relationships in the same way we do with just state. Another way to look at this is how to integrate scripted events with autonomous events, right? Either case alone is not trivial but has been done many times. And that has to be repeatable, and you have to be able to move backward and forward and so. So here's something cool -- EMF has a whole change management infrastructure and there are projects that provide transaction support and so on. You could use that, or better use it as the inspiration for something more powerful.. As background, one of the many reasons that I thought a very high level meta-model behind all of the behavior (Actions) way the way to go and not just say support some kind of general purpose scripting language is that I wanted to have all of the behavior be really amenable to inference and manipulation. That means having pretty atomic behaviors and is part of why I've resisted things like generic loop structures and what not. (I'lll avoid going into too much detail on that since my mind is completely worn out from looking at xml build artifacts and logs.) But if you're using Actions to define your model, you already have the full semantics for expressing anything that can happen within the model! That might be sort of obvious but what it means is at runtime you can use instantiations (analogs) of the actions to record, reverse and insert actions arbitrarily. I'm thinking that the most general solution with respect to that is to just treat the agent state aspect of that as a special case, using Set Actions. In practice you might want to do that in a more efficient way but it would all follow this design notionally. I'm not sure that's all clear -- but does that handle the Airport use case do you think? Now, there are a lot of devils in these details. The protocol for instantiating agents within the context of a hierarchy has all kinds of interesting / difficult / annoying aspects that have been worked out in various ways in prior toolsets. Just to give some sense of this, think about the problem of deciding how to build up a tree of agent hierachries (like Human -> Organ -> Cell -> Protein). Wether you do that depth first or breadth first could have important consequences. The ideal way to handle that is to leave it open so that models can be tested under different protocols. But as say, just to give a sense of some of the complexities involved as we think about how to limit the scope of this task. BTW, leaving the input data side out there is still something to be said about the old approach of simply saving the random seed and then rerunning the model to recover the data. In a way it's the most powerful compression technique ever invented -- one integer value and you get 100MB of data! So perhaps what you would want to do is actually be able to abstract *that* out so that people could choose to save off a model that was full of data for every period, one that only have the most recent period or simply has: <run seed="99920202" time="1000"/> :) cheers! Miles
Jonas, et. al., I think its time to begin thinking about how we might actually implement this. One thing that might be relevant is that I've begun experimenting with Ecore <-> Acore mappings. See bug 325484. cheers, Miles